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ABSTRACT 


How  do  we  assign  nouns  correctly  to  their  underlying  case  roles  in  English,  and  how  do  we 
select  an  appropriate  verb  frame  to  assign  these  nouns  to?  How  do  we  know  whether  a  noun 
phrase  is  a  modifier  of  a  preceding  noun  phrase  or  an  argument  of  the  verb?  How  do  we 
select  the  correct  meaning  of  each  noun  in  the  sentence,  and  how  do  we  allow  context  to 
modulate  its  meaning?  How  do  we  know  how  to  handle  new  nouns  and  verbs?  In  this  article 
we  describe  a  simulation  model  that  addresses  these  questions  from  a  perspective  quite 
different  from  the  conventional  perspective  found  in  Computational  Linguistics.  Words  are 
treated  as  patterns  of  activation,  and  knowledge  about  them  is  stored  in  distributed  form,  in 
the  connections  in  a  large  network  of  simple  neuron-like  processing  units.  The  model  exhibits 
considerable  facility  in  dealing  with  the  problems  of  frame  selection,  role  assignment,  disambi¬ 
guation,  etc.  and  suggests  a  natural  way  to  resolve  unappealing  aspects  of  the  idea  that  there 
is  a  fixed  set  of  individuated  case  roles.  So  far,  our  simulation  model  can  only  process  one- 
clause  sentences.  Possible  extensions  to  multi-clause  sentences  are  described.  ■ 
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Like  many  natural  cognitive  processes,  the  process  of  sentence  comprehension  involves 
the  simultaneous  consideration  of  a  large  number  of  different  sources  of  information.  In  this 
chapter,  we  consider  one  aspect  of  sentence  comprehension:  the  assignment  of  the  consti¬ 
tuents  of  a  sentence  to  the  correct  thematic  case  roles.  Case  role  assign  .lent  is  not,  of  course, 
all  there  is  to  comprehension,  but  it  reflects  one  important  aspect  of  the  comprehension  pro¬ 
cess,  namely,  the  specification  of  who  did  what  to  whom. 

Case  role  assignment  is  not  at  all  a  trivial  matter  either,  as  we  can  see  by  considering 
some  sentences  and  the  case  roles  we  assign  to  their  constituents.  We  begin  with  several  sen¬ 
tences  using  the  verb  break : 

( 1 )  The  boy  broke  the  window. 

(2)  The  rock  broke  the  window. 

(3)  The  window  broke. 

(4)  The  boy  broke  the  window  with  the  rock. 

(5)  The  boy  broke  the  window  with  the  curtain. 

We  can  see  that  the  assignment  of  case  roles  here  is  quite  complex.  The  first  noun  phrase 
(NP)  of  the  sentence  can  be  the  Agent  (Sentences  1,  4,  and  5),  the  Instrument  (Sentence  2),  oi 
the  Patient  (Sentence  3).  The  NP  in  the  prepositional  phrase  (PP)  could  be  the  Instrument 
(Sentence  4),  or  it  could  be  a  Modifier  of  the  second  NP,  as  it  is  in  at  least  one  reading  of  Sen¬ 
tence  5.  Another  example  again  brings  out  the  ambiguity  of  the  role  assignment  of  with-NPs: 

(6)  The  boy  ate  the  pasta  with  the  sauce. 

(7)  The  boy  ate  the  pasta  with  the  fork. 

In  (6)  the  with-NP  clearly  does  not  specify  an  Instrument,  but  in  (7)  it  clearly  does. 

Before  we  go  much  further,  it  should  be  said  that  there  is  no  universally  accepted  set  of 
case  roles,  nor  universal  agreement  as  to  the  correct  assignment  of  constituents  to  roles.  We 
have  adopted  conventions  close  to  those  originally  introduced  by  Fillmore  (1968)  in  "The  Case 
for  Case,"  but  we  do  not  think  the  details  arc  of  crucial  importance  to  the  behavior  of  our 
model.  Later  we  will  suggest  ways  in  which  an  extension  of  our  model  might  circumvent  cer¬ 
tain  of  the  difficulties  involved  in  specifying  the  correct  assignment  of  cases. 

These  complications  aside,  it  appears  from  the  examples  that  the  meaning  of  the  words 
in  these  sentences  influences  the  assignment  of  arguments  to  roles.  However,  the  placement  of 
NPs  within  the  sentences  is  also  very  important.  Consider  these  two  cases: 


(S)  The  vase  broke  the  window. 
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(‘J)  The  window  broke  the  vase. 

Here  we  must  rely  on  word-order  constraints.  That  such  constraints  are  very  strong  in  Hnglish 
can  be  seen  from  sentences  like: 

(10)  The  pencil  kicked  the  cow. 

Even  though  semantic  constraints  clearly  would  indicate  that  the  cow  is  a  much  more  likely 
Agent  and  the  pencil  a  much  more  likely  Patient,  Sentence  10  simply  is  not  given  this  interpre¬ 
tation  by  adult  readers  who  are  native  speakers  of  English. 

Word-order  constraints  like  those  illustrated  by  (10)  are  very  strong  in  English,  but  it  is 
important  to  realize  that  such  heavy  reliance  on  such  constraints  is  not  universal.  Bates  and 
MacWhinncy  (in  press;  MaeWhinnev,  Bates,  &  Klicgl,  1984)  have  shown  that  adult  speakers 
of  Italian  will  assign  roles  to  sentences  like  (10)  based  predominantly  on  semantic  constraints;1 
word  order  plays  a  very  limited  role  and  determines  assignment  only  when  semantics  and 
case-marking  inflections  give  no  information. 

As  the  work  of  Bates  and  MaeWhinnev  amply  demonstrates,  case  role  assignment  is 
influenced  by  at  least  three  different  kinds  of  factors:  word  order,  semantic  constraints,  and 
(when  available)  inflectional  morphology.  Reliance  on  any  one  of  these  constraints  is  a  matter 
of  degree,  and  vanes  from  language  to  language.  In  addition  to  these  factors,  there  is  one 
more  that  cannot  be  ignored,  namely,  the  more  global  context  in  which  the  sentence  is 
presented.  Consider,  for  example.  Sentence  11: 

(11)  The  boy  saw  the  girl  with  the  binoculars. 

We  get  one  reading  if  prior  context  tells  us  "A  boy  was  looking  out  the  window,  try  ing  to  sec- 
how  much  he  could  see  with  various  optical  instruments.”  We  get  quite  a  different  one  if  it 
says  "Two  girls  were  trying  to  identify  some  birds  when  a  boy  came  along.  One  girl  had  a  pair 
of  binoculars  and  the  other  did  not.”  Cram  and  Steedman  (1985)  have  experimentally  demon¬ 
strated  contextual  influences  on  parsing  decisions. 

While  the  fact  that  word  order  and  semantic  constraints  both  influence  role  assignment 
has  often  been  acknowledged  (Bevcr,  1970;  l  odor.  Bever.  A  Ciarrett,  1974),  there  are  few  exist¬ 
ing  models  that  go  very  far  toward  proposing  a  mechanism  to  account  for  these  effects.  How¬ 
ever,  there  are  some  researchers  in  language  processing  who  have  tried  to  find  ways  of  bring¬ 
ing  semantic  considerations  into  syntactic  processing  in  one  way  or  another.  One  recent 
approach  has  been  to  rely  on  the  lexicon  to  influence  both  syntactic  processing  and  the 


I  We  use  the  phrase  "semantic  constraints  "  to  refer  to  the  constraints  l.tncu.icr  users  impose  on  the  co  occurrence 
of  constituents  in  particular  roles  in  case  level  representations.  In  the  model,  as  we  shall  see,  these  constraints  arise 
from  the  co  occurrences  of  constituents  in  the  experiences  the  model  is  exposed  to. 
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construction  of  underlying  functional  representations  (Ford,  Bresnan,  &  Kaplan,  1082;  Kaplan  1 

&  Bresnan,  1082;  MacWhinncy  &  Sokolov,  in  press).  Ford  et  al.  (1982)  considered  cases  like 
the  following: 

( 1 2)  The  woman  wanted  the  dress  on  the  rack.  ( 

(13)  The  woman  positioned  the  dress  on  the  rack.  | 

They  noted  that  the  preferred  reading  of  the  first  of  these  had  on  the  rack  as  a  modifier  of  the 
dress,  while  the  preferred  reading  of  the  second  had  on  the  rack  as  a  locative  argument  of  posi¬ 
tioned.  To  account  for  this  difference  in  role  assignment,  they  proposed  two  principles:  (a)  lex¬ 
ical  preference  and  (b)  final  arguments.  Basically,  lexical  preference  establishes  an  expected  j 

argument  structure  (e.g.,  Subject-Vcrb-Object  in  the  case  of  wane,  Subject-Vcrb-Objcct- 
Prcpositional  Object  in  the  case  of  positioned)  by  consulting  an  ordered  list  of  possible  argu¬ 
ment  structures  associated  with  each  verb.  If  a  constituent  is  encountered  that  could  fill  a  slot 
in  the  expected  argument  structure,  the  constituent  is  treated  as  an  argument  of  the  verb.* 

However,  if  a  constituent  is  encountered  that  appears  to  satisfy  the  conditions  on  the  final 
argument  of  the  expected  argument  structure,  its  attachment  is  delayed  to  allow  for  the  incor-  I 

poration  into  the  constituent  of  subsequent  constituents.  Thus,  with  want,  the  NP  the  dress  is 
a  candidate  for  final  argument  and  is  not  attached  directly  as  a  constituent  of  the  VP;  rather,  a 
superordinate  NP  structure  containing  the  dress  on  the  rack  is  ultimately  attached  to  the  VP. 

With  position,  however,  the  dress  would  not  be  the  final  argument,  and  so  is  attached  directly 

to  the  VP  and  closed.  On  the  rack  is  then  available  for  attachment  as  the  final  argument  to  I 

the  VP. 

While  this  scheme  certainly  does  some  of  the  work  that  needs  to  be  done  in  allowing  the 
constraints  imposed  by  the  words  in  a  sentence  to  influence  role  assignment,  we  do  not  think 
it  goes  nearly  far  enough.  For  as  we  saw  in  Sentences  4-7,  the  \Ps  of  a  sentence  also  I 

influence  syntactic  decisions.  Oden  (1978)  has  verified  that  all  three  \Ps  in  sentences  like 
these  influence  subjects’  role-assignment  decisions. 

In  the  literature  on  sentence  processing,  no  one  disputes  that  various  factors  influence 
the  final  reading  that  is  assigned  to  a  sentence.  However,  there  arc  various  views  of  the  way 
in  which  these  factors  are  taken  into  account  on-line.  Kurtzman  (1985)  argues  that  the  pars-  I 

ing  process  is  directly  guided  by  an  ongoing  plausibility  analysis;  Marslen- Wilson  and  Tyler 
(1981)  have  pioneered  this  sort  of  view,  and  they  stress  the  immediacy  with  which  syntactic, 
semantic,  and  pragmatic  considerations  can  all  be  brought  to  bear  on  the  course  of  sentence 
processing.  On  the  other  hand,  Frazier  and  her  colleagues  (e.g.,  Frazier  &  Rayner,  l')S2; 

Rayner,  Carlson,  &  Frazier,  1983)  argue  that  the  syntactic  parser  imposes  its  preferred  struc-  | 

turing  on  the  sentence  based  only  on  syntactic  considerations,  passing  the  results  of  this  pro¬ 
cessing  on  quickly  to  a  thematic  interpreter  that  can  reject  the  syntactic  parse  in  favor  of  a 
thematically  more  appropriate  reading. 


I 
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Whichever  view  one  holds,  it  is  clear  that  a  mechanism  is  needed  in  which  all  the  consti¬ 
tuents  of  a  sentence  can  work  simultaneously  to  influence  the  assignment  of  roles  to  consti¬ 
tuents.  While  we  ourselves  tend  to  favor  a  highly  interactive  view,  the  model  we  will  describe 
here  takes  as  its  input  a  partial  surface  parse  (though  it  is  one  that  leaves  certain  attachment 
decisions  unspecified)  and  generates  from  it  a  case-level  representation.  Intended  extensions  of 
the  model,  which  we  will  describe  below,  would  incorporate  feedback  to  the  syntactic  structure 
level;  but  most  of  the  model’s  behavior  is  not  dependent  on  this  feedback,  and  so  readers  com¬ 
mitted  to  a  less  interactive  view  of  the  relation  between  syntactic  and  thematic  analyses  may 
yet  find  the  model  to  be  of  interest. 


GOALS 

I  he  primary  goal  of  our  model  is  to  provide  a  mechanism  that  can  begin  to  account  for 
the  joint  role  of  word  order  and  semantic  constraints  on  role  assignment.  We  wanted  the 
model  to  be  able  to  learn  to  do  this  based  on  experience  with  sentences  and  their  case 
representations.  We  wanted  the  model  to  be  able  to  generalize  what  it  learned  to  new  sen¬ 
tences  made  up  of  novel  combinations  of  words. 

In  addition,  we  had  several  other  goals  for  the  model: 

•  We  wanted  the  model  to  be  able  to  select  contextually  appropriate  readings  of  ambigu¬ 
ous  words. 

•  We  wanted  the  model  to  select  the  appropriate  verb  frame  based  on  the  pattern  of 
arguments  and  their  semantic  features. 

•  We  wanted  the  model  to  fill  in  missing  arguments  in  incomplete  sentences  with  plausi¬ 
ble  default  values. 

•  We  wanted  the  model  to  be  able  to  generalize  its  knowledge  of  correct  role  assignment 
to  sentences  containing  a  word  it  has  never  seen  before,  given  only  a  ..pacification  of 
some  of  the  semantic  properties  of  the  word. 

The  model  succeeded  in  meeting  all  these  goals,  as  we  shall  sec. 

The  model  also  exhibits  an  additional  property  that  wc  had  not  actually  anticipated, 
even  though  it  is  a  central  characteristic  of  language  understanding:  The  model  exhibits  an 
uncanny  tendency  to  shade  its  representation  of  the  constituents  of  a  sentence  in  ways  that  arc 
contextually  appropriate.  It  does  this  without  any  explicit  training  to  do  so;  in  fact,  it  does 
this  in  spite  of  the  fact  that  the  training  inputs  it  receives  are  not  contextually  shaded  as  they 
would  be  in  reality.  Wc  will  examine  this  aspect  of  the  model's  behavior  through  examples, 
and  observe  how  it  emerges  naturally  from  the  model's  structure. 
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The  model  is,  of  course,  very  far  from  a  complete  or  final  model  of  sentence  processing 
or  even  case  role  assignment.  Perhaps  it  is  best  seen  as  a  partial  instantiation  of  one  view  of 
what  some  properties  of  the  interface  between  syntactic  and  more  conceptual  levels  of 
language  representation  might  be  like.  We  offer  the  model  not  because  it  "solves  the  problem 
of  sentence  comprehension."  Rather,  we  offer  it  because  it  suggests  new  ways  of  thinking 
about  several  aspects  of  language  and  language  representation.  The  simulation  model  that 
embodies  these  ideas  will  undoubtedly  require  substantial  development  and  elaboration.  It  is 
our  belief,  though,  that  the  basic  principles  that  it  embodies  will  prove  extremely  valuable  as 
cognitive  science  continues  to  try  to  come  to  grips  with  the  problem  of  understanding  natural 
language. 

We  have  limited  the  model  in  several  ways.  Most  importantly,  we  have  considered  only 
single  clause  sentences.  We  have  also  considered  only  a  limited  set  of  roles  and  a  limited 
vocabulary.  Since  we  have  restricted  the  analysis  to  T.ngiish.  case  inflectional  morphology  does 
not  arise.  Within  these  bounds,  we  will  see  that  we  have  been  able  to  meet  the  goals  of  the 
model  quite  successfully,  using  a  very  simple  PDP  architecture. 

Previous,  Related  l Pork 

Both  Cottrell  (1985;  Cottrell  &  Small,  1983)  and  Waltz  and  Pollack  (1985)  have  preceded 
us  in  noting  the  appeal  of  connectionism  as  a  means  of  exploiting  the  multiple  constraints  that 
appear  to  influence  both  case  role  assignment  and  the  contextual  disambiguation  of  ambigu¬ 
ous  noun  phrases.  Their  models  ditTer  from  ours  in  several  ways,  most  notably  in  that  both 
rely  primarily  on  local  representations  (one-unit-one-concept)  as  opposed  to  distributed 
representations,  although  Waltz  and  Pollack  (1985)  do  suggest  ways  that  a  distributed 
representation  could  be  used  to  represent  global  contextual  influences  on  word  meaning 
disambiguation.  Within  the  context  of  distributed  models,  ours  builds  on  the  work  of  J.  A. 
Anderson  (1983)  and  Kawamoto  (T9S5):  Both  models  show  how  context  can  be  used  to  select 
the  appropriate  reading  of  an  ambiguous  word.  Our  work  incorporates  mechanisms  quite  like 
theirs  to  accomplish  this  and  other  goals.  Finally,  Hinton's  (1981a)  early  discussion  of  the  use 
of  distributed  representations  to  represent  propositions  played  an  important  role  in  the 
development  of  the  ideas  described  here. 

ARCHITECTURE  OF  THE  MODEL 

The  role-assignment  model  is  a  distributed  model,  and  has  many  properties  in  common 
with  the  verb  learning  model  described  in  Chapter  IS.  The  model  consists  of  two  sets  of  units: 
one  for  representing  the  surface  structure  of  the  sentence  and  one  for  representing  its  case 
structure.  The  model  learns  through  presentations  of  correct  surface-structure  case-structure 
pairs;  during  testing,  we  simply  present  the  surface-structure  input  and  examine  the  output  the 
model  generates  at  the  case-structure  level. 
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Sentences.  The  sentences  processed  by  the  model  consist  ot  a  \erb  and  from  one  to 
three  NPs.  There  is  always  a  Subject  NP,  and  optionallv  there  mas  be  an  Object  NP.  If  this 
is  present,  there  may  also  be  a  with-.XP:  that  is,  a  NP  in  a  sentence-final  prepositional  phrase 
beginning  with  the  word  with.  All  of  the  numbered  sentences  considered  in  the  introduction 
are  examples  of  sentence  types  that  might  be  presented  to  the  model. 

Input  format  of  sentences.  What  the  model  actually  sees  as  input  is  not  the  raw  sen¬ 
tence  but  a  canonical  representation  of  the  constituent  structure  ol'  the  sentence,  in  a  form 
that  could  be  produced  by  a  simple  surface  parser  and  a  simple  lexicon.  Such  a  parser  and 
lexicon  are  not,  in  fact,  parts  of  the  model  in  its  present  form  —  the  sentences  are  simply 
presented  to  the  model  in  this  canonical  format.  We  discuss  ways  such  a  parser  could  be 
implemented  in  a  PDP  model  in  the  discussion  section. 

Semantic  .Microfeatures 

In  the  canonical  input  format,  words  are  represented  as  lists  of  semantic  microfeatures 
(Hinton,  1981a;  sec  Chapter  3;  Walt/  &  Pollack,  1985,  also  make  some  use  of  a  microfcaturc 
representation).  I  or  both  nouns  and  verbs,  the  features  are  grouped  into  several  dimensions. 
Hach  dimension  consists  of  a  set  of  mutually  exclusive  values,  and,  in  general,  each  word  is 
represented  bv  a  vector  in  which  one  and  only  one  value  on  each  dimension  is  ON  for  the 
word  and  all  of  the  other  values  are  Oi  l'.  Values  that  are  set  to  be  ON  are  represented  in  the 
feature  vectors  as  Is.  Values  that  arc  set  to  be  OFF  arc  represented  as  dots 

We  chose  the  dimensions  and  the  values  on  each  dimension  to  capture  what  we  felt  were 
important  dimensions  of  semantic  variation  in  the  meanings  of  words  that  had  implications  for 
the  role  assignments  of  the  words.  We  should  be  very  clear  about  one  point,  though,  which  is 
that  we  do  not  want  to  suggest  that  the  full  range  of  the  phenomena  that  are  described  under 
the  rubric  of  the  "meanings”  of  the  words  are  captured  by  these  semantic  microfeatures. 
Indeed,  we  do  not  think  of  words  as  actually  having  some  fixed  meaning  at  all.  Exactly  how 
we  do  think  of  meanings  will  become  clear  after  we  examine  the  behavior  of  the  model,  so  we 
postpone  a  fuller  consideration  of  this  issue  until  the  discussion. 

The  full  set  of  dimensions  used  in  the  feature  sets  are  given  in  Table  1.  The  noun  dimen¬ 
sions  arc  largely  self-explanatory,  but  the  different  dimensions  of  the  verbs  may  need  some 
explication.  Basically,  these  dimensions  are  seen  as  capturing  properties  of  the  scenario 
specified  by  the  verb.  Thus,  the  DOER  dimension  indicates  whether  there  is  an  Agent  insti¬ 
gating  the  event.  The  CAUSE  dimension  specifies  whether  the  verb  is  causal.  If  not,  it  indi¬ 
cates  whether  this  is  because  there  is  no  cause  specified  tas  in  the  case  of  the  window  broke  I  or 
whether  it  is  because  there  is  no  change  (as  in  the  case  of  the  hoy  touched  the  ytrf).  The 
TOl  (  I I  dimension  indicates  whether  the  Agent,  the  Instrument,  both,  or  neither  touches  the 
Patient;  the  "AisP"  value  simply  indicates  that  the  Agent  and  the  Patient  are  the  same  (as  m 
the  cat  moved).  The  NATCIING  dimension  specifies  the  nature  of  the  change  that  takes 
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Table  1 

FEATURE  DIMENSIONS  AND  VALUES 


Nouns 

HUMAN 

human  nonhuman 

SOFTNESS 

soft  hard 

GENDER 

male  female  neuter 

VOLUME 

small  medium  large 

FORM 

compact  I  D  2D  3  D 

POINTINESS 

pointed  rounded 

BREAK  ABILITY 

fragile  unbreakable 

OBJ  TYPE 

food  toy  tool  utensil  furniture 

animate  nat-inan 

Verbs 

DOER 

yes  no 

CAUSE 

yes  no-cause  no-change 

TOUCH 

agent  inst  b<>fti  none  AisP 

NATCHNG 

pieces  shreds  chemical  none 
unused 

AGTMVMT 

trans  part  none  N  A 

PT_MVMT 

Ir.'ins  pari  none  \A 

INTENSITY 

low  high 

Note:  nat-inan  =  natural  inanimate.  AisP  =  Agent  is  Patient. 
NA  =  not  applicable. 


place  in  the  Patient.  The  AGTMVMT  and  PT.MVMT  specify  the  movement  of  the  Agent 
and  the  Patient,  respectively;  and  INTENSITY  simply  indicates  the  forcefnlness  of  the  action. 
The  labels  given  to  the  dimensions  are,  of  course,  only  for  reference;  they  were  chosen  so  that 
each  noun  or  verb  dimension  would  ha\e  a  unique  first  letter  that  could  be  used  to  designate 
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It  must  be  stressed  that  we  are  not  strongly  wedded  to  this  particular  choice  of  features, 
and  that  other  features  would  need  to  be  included  to  extend  the  model  to  larger  sets  of  nouns 
and  verbs.  On  the  other  hand,  the  features  that  we  did  include  were  carefullv  chosen  because 
thev  seemed  highly  relevant  to  determining  the  case  role  assignments.  For  example,  the  DOFR 
dimension  directly  specifies  whether  there  is  or  is  not  an  Agent.  Thus,  the  features  of  the  verb, 
in  particular,  often  have  direct  case-structural  implications.  (We  would  prefer  a  model  that 
constructed  its  own  semantic  microfeatures  using  back  propagation  [Chapter  S]  or  a  related 
method  for  learning,  but  this  extension  has  not  yet  been  implemented.) 


Figures  1  and  2  give  the  vectors  that  we  assigned  to  each  of  the  words  used  in  the  model. 
It  will  be  immediately  noted  that  some  of  our  encoding  decisions  were  arbitrarv,  and  that 
sometimes  we  seem  to  be  forcing  words  into  molds  that  they  do  not  perfectly  fit.  Further. 


ball 

float 

bb-bat 

bat 

boy 

paperwt 

cheese 

li-chicken 

co-chicKen 

chicken 

curtain 

besk 

bed 

food 

fork 

girt 

hatchet 

hammer 


HU  SO  GNO  VOL 
11  11 


OBJ  TVP 
1 


\  ieure  1  Fhe  nouns  used  in  the  model  and  their  features.  For  ambiguous  noun  constituent*.  the  correct,  lulls 
specified  reading  was  used  in  specifying  what  the  case  role  representaUon  of  the  constituent  should  he.  hut  the 
underspecified,  ambiguous  forms  were  used  in  the  sentence  lex el  input  representation  See  text  for  a  full  discussion 
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each  feature  has  the  same  weight  as  all  the  others,  and  is  as  definite  as  all  the  others.  Reality 
is  not  nearly  so  definite  or  evenhanded,  of  course.  Balls  are  round,  but  may  be  soft  or  hard: 
paperweights  are  generally  compact  in  shape  but  need  not  be,  etc.  The  definiteness  of  the 
input  used  in  the  simulations  is  a  simplification  that  we  have  adopted  to  make  the  initial  cod¬ 
ing  of  the  input  patterns  as  straightforward  as  possible.  A  more  realistic  coding  would  allow 
some  features  to  be  more  definite  than  others.  We  will  see  that  the  model  tends  to  correct 
this  deficiency  on  its  own  accord. 

One  of  our  goals  for  the  model  is  to  show  how  it  can  select  the  contextually  appropriate 
meaning  for  an  ambiguous  word.  For  ambiguous  words  [bat,  flying  or  baseball,  and  chicken. 
living  or  cooked,  the  input  pattern  is  the  average  of  the  feature  patterns  of  each  of  the  two 
readings  of  the  word.  This  means  that  in  cases  where  the  two  agree  on  the  value  of  a  particu¬ 
lar  input  dimension,  that  dimension  has  the  agreed  value  in  the  input  representation.  In  cases 
where  the  two  disagree,  the  feature  has  a  value  of  .5  (represented  by  "?")  in  the  input 
representation.  A  goal  of  the  simulations  is  to  see  if  the  model  can  correctly  fill  in  these 


ate 

aieAVP 

ateAVPI 

ateAVF 

00  CAU  TOUCH 
11.  1 

1  1  .  .  1 

1  1  .1 

11  1 

Broke 

brokeAVPI 

BrokeAVP 

brokelVP 

broke?'/ 

1  1  1 

1  1  1 

1  1  1 

1  1  1 

1  1  1 

Bit 

hitAVPI 

BitAVP 

tlitlVP 

1  1  1 

1  1  1 

1  1  1 

1  1  1 

moved 

movedAVP 

movedAVS 

movedPV 

1  1  1 

1  1  1 

1  1  1 

1  1  1 

touched 

touchedAVPi 

touchedAvP 

toucnedIVP 

1  1  1 

1  1  1 

1  1  1 

1  1  1 

figure  2.  The  verbs  used  in  the  model  and  iheir  microfealurc  representations.  I  he  forms  followed  by  -trines 
of  uppercase  letters  (e.g.,  AVPI)  represent  the  alternative  feature  patterns  that  the  model  must  choose  between  as  ns 
way  of  specifying  the  contextually  appropriate  reading  of  the  verb  These  alternative  feature  patterns  correspond  to 
the  semantic  features  of  the  verb  appropriate  for  parUcular  configurations  of  case  roles,  as  indicated  hy  the  upper 
case  letters:  A  =  \gent.  V  =  Verb.  P  =  Patient.  I  =  Instrument.  \t  =  Modifier.  S  =  Self.  I  =  implied  food. 
The  position  of  the  letter  indicates  the  position  of  the  corresponding  constituent  in  the  input  sentence  1  he  patterns 
given  with  the  generic  verb  unadorned  by  uppercase  letters  were  used  in  the  sentence  level,  input  representations 
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unspecified  values,  effectively  retrieving  the  contextually  appropriate  missing  values  in  the  pro¬ 
cess  of  assigning  the  word  to  the  appropriate  case  role.  Figure  1  indicates  both  "full"  readings 
of  bat  and  chicken,  as  well  as  the  ambiguous  forms  used  as  inputs. 2 

Another  goal  for  the  model  is  to  show  how  it  can  select  the  contextually  appropriate 
reading  of  a  verb.  This  is  handled  in  much  the  same  way  as  noun  ambiguity  resolution.  The 
different  readings  are  represented  by  (potentially)  different  sets  of  semantic  microfcaturcs;  for 
example,  the  Agent ,  No-Instrumcnt  reading  o {broke  (brokcAVP)  involves  contact  between  the 
Agent  and  the  Patient,  while  the  Instrument  No-Agent  version  (broke(VP)  and  the 
Agent  Instrument  version  (brokcAVPI)  involve  contact  between  the  Instrument  and  the 
Patient.  The  input  representation  of  the  features  of  a  given  verb  is  the  same,  regardless  of 
context,  and  the  task  given  to  the  model  is  to  activate  the  set  of  features  for  the  sentence- 
appropriate  version.  Rather  than  use  the  average  pattern  based  on  all  of  the  different  possible 
readings  of  the  verb,  we  used  a  "generic”  pattern  for  each  verb,  which  is  the  pattern  for  what 
we  took  to  be  the  verb's  most  typical  case  frame.  This  is  indicated  in  Figure  2  by  the  pattern 
of  features  next  to  the  plain  verb? 

The  feature  patterns  corresponding  to  the  different  case  frames  the  model  must  choose 
among  are  indicated  on  the  lines  in  the  tabic  following  its  generic  pattern.  (The  labels  on 
these  lines  are  used  simply  to  designate  the  feature  patterns.  They  indicate  the  roles  the  vari¬ 
ous  arguments  in  the  surface  structure  of  the  sentence  play.  Thus,  brokeAVPl  specifics  the 
case  frame  in  which  the  surface  subject  is  the  Agent,  the  surface  object  is  the  Patient,  and  the 
with-NP  is  the  Instrument.)  Note  that  the  microfcaturcs  of  two  different  readings  of  the  same 
verb  may  or  may  not  differ,  depending  on  whether  the  features  of  the  scenario  do  or  do  not 
change  in  different  case  frames. 

The  feature  vectors  for  the  constituents  of  the  sentence  The  boy  broke  the  window  with 
the  hammer  arc  shown  just  below  the  corresponding  constituents  at  the  top  of  Figure  3.  Note 
that  these  are  displayed  in  the  order:  Verb,  Subject  NP,  Object  NP,  and  With-NP.  The  row 
of  letters  below  the  feature  vectors  indicates  the  first  letter  of  the  name  of  the  dimension  on 


2.  For  the  concept  food,  which  is  taken  to  be  the  implied  Patient  in  sentences  like  The  boy  ate.  no  particular  shape 
seems  appropriate.  Therefore  the  intended  output  representation  is  assumed  to  be  unspecified  (as  indicated  by  the 
"?")  for  all  values  on  the  shape  dimension.  For  all  other  dimensions,  food  has  what  we  take  to  be  the  typical  values 
for  foods. 

3.  The  different  handling  of  nouns  and  verbs  is  not  a  principled  distinction,  but  an  exploration  of  two  endpoints  on 
a  continuum  ranging  from  undcrspecification  of  the  input  for  ambiguous  words  to  complete  specification  of  an  input 
representation,  regardless  of  the  fact  that  the  features  used  in  the  case  role  representation  will  differ  as  a  function  of 
context.  Perhaps  the  idea  that  the  features  will  be  altered  as  a  function  of  context  is  the  best  way  of  putting  things  in 
this  case.  We  imagine  that  lire  true  state  of  affairs  is  intermediate  between  these  two  extremes,  for  both  nouns  and 
verbs.  In  any  case,  the  model  does  not  have  any  prior  commitment  to  the  idea  that  the  features  in  the  input 
representation  should  be  preserved  in  the  output  representation;  the  full  prespecilication  simple  gives  the  model  a 
fuller  description  to  work  from,  thereby  allowing  greater  differentiation  of  the  different  verbs. 
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Figure  3.  The  top  line  of  this  figure  displays  the  constituents  of  the  sentence  The  buy  broke  ;he  with 

;he  hummer  in  the  order:  Verb.  Subject  NP,  Object  NP,  and  Wiih-NP  Below  these  are  the  microl'calurcs  of  each 
constituent,  and  below  these  arc  conjunctive  sentence  structure  units  for  each  constituent  Below  the  horizontal  line 
are  the  blocks  of  case -structure  units  for  the  Agent.  Paucnl,  Instrument,  and  Modifier  roles.  Below  these  an  mdi 
caUon  of  the  pattern  of  noun  features  the  model  is  activaung  for  each  slot  (represented  by  the  vertical  black  bars), 
followed  by  a  representation  of  the  microfealures  of  the  correct  filler  of  each  slot.  The  last  line  gives  the  label  of  the 
correct  head  (verb  frame  or  modified  NP)  jnd  tail  (slot  tiller)  for  each  slot.  See  text  for  further  explanation. 

which  each  feature  represents  a  value.  I  or  example,  the  first  two  elements  of  the  verb  feature 
sector  are  labeled  d  for  the  DOIR  dimension;  the  first  two  values  of  each  of  the  three  noun 
feature  vectors  are  labeled  n  for  the  111  MAN  dimension. 

Sentence-structure  units.  The  sentence-structure  level  representation  of  an  input  sen¬ 
tence  is  not  actually  the  set  of  constituent  feature  vectors;  rather,  it  is  the  pattern  of  activation 
these  vectors  produce  over  units  that  correspond  to  pairs  of  features.  1'hese  units  are  called 
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sentence-structure  (SS)  units.4 

Each  SS  unit  represents  the  conjunction  of  two  microfeatures  of  the  filler  of  a  particular 
surface  role.  Since  there  are  four  sentence-structure  roles,  there  are  four  sets  of  SS  units. 
Within  each  set  there  is  a  unit  that  stands  for  the  conjunction  of  even,'  microfeature  value  on 
each  dimension  with  every  microfeature  value  on  every  other  dimension.  For  example,  for 
nouns  there  arc  units  for: 


HI  MAN  =  yes  GFNDFR  =  male 
SOLIDITY  =  hard  BREAKABILITY  =  fragile 

among  many  others;  for  the  verb,  one  of  the  units  corresponds  to 

DOER  =  yes  TOUCH  =  instrument 

(i.c.,  there  is  a  doer  —  the  Instrument  touches  the  Patient). 

The  sentence-structure  units  are  displayed  in  Figure  3  in  four  roughly  triangular  arrays. 
The  verb  array  is  separated  from  the  arrays  for  the  three  NPs  to  indicate  that  different  features 
arc  conjoined  in  the  verb  and  NP  representations. 

Each  array  contains  the  conjunctive  units  for  the  constituent  immediately  above  it.  There 
is  a  unit  wherever  there  is  a  1,  a  ”?”,  or  a  Within  each  array,  the  units  are  laid  out  in 
such  a  way  that  the  column  a  unit  is  in  indicates  one  of  the  microfeaturcs  that  it  stands  for. 
and  the  row  it  is  in  indicates  the  other  microfeature.  Rows  and  columns  arc  both  ordered  in 
the  same  way  as  the  microfeature  vectors  at  the  top  of  the  figure.  The  dimensions  arc  indi¬ 
cated  by  the  row  of  letters  across  the  top  of  each  array  and  along  the  left  (for  the  verb  units) 
or  right  (for  the  three  sets  of  NP  units).  Note  that  the  set  of  units  in  each  array  fills  less  than 
half  of  each  block  for  two  reasons,  f  irst,  there  are  only  | n(n  -  1)|  2  distinct  pairs  of  n 
features;  second,  pairs  of  values  on  the  same  dimension  arc  not  included. 

We  considered  various  schemes  for  activating  the  sentence-structure  units.  One  possible 
scheme  would  be  to  use  a  strict  deterministic  activation  rule,  so  that  a  particular  SS  unit 
would  be  turned  on  only  if  both  of  the  features  the  unit  stands  for  were  on  in  the  feature  vec¬ 
tor.  This  use  of  the  SS  units  would  allow  the  model  to  learn  to  respond  m  a  finely  tuned  way 
to  particular  conjunctions  of  microfeaturcs.  However,  vve  wished  to  sec  how  well  the  model 
could  function  using  an  inherently  noisy  input  representation.  Furthermore,  as  discussed  in 


4  An  alternative  name  for  these  units  would  he  "surface-structure  "  units,  lo  indicate  that  they  do  not  capture  the 
notion  of  underlying  subject,  object,  etc.  However,  we  have  chosen  the  term  "sentence  structure "  because,  for 
present  purposes,  the  information  they  capture  is  not  even  a  full  surface  structure  parse  of  the  sentence:  in  particular. 
U  does  not  specify  the  attachment  of  Uie  with-NP. 
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Chapter  IS,  we  knew  that  generalization  is  facilitated  when  units  that  only  partially  match  the 
input  have  some  chance  of  being  activated.  In  the  present  case,  we  considered  it  important  to 
be  able  to  generalize  to  words  with  similar  meanings.  Therefore,  the  SS  units  were  treated  as 
stochastic  binary'  units,  like  the  units  used  in  Chapter  18.  Fach  SS  unit  received  excitatory 
input  from  each  of  the  two  features  that  it  stands  for,  and  we  set  the  bias  and  variance  of  the 
units  so  that  when  both  of  a  SS  unit's  features  were  active,  the  unit  came  on  with  probability 
.S5;  and  when  neither  was  active,  it  came  on  with  probability  .15.  These  cases  are  represented 
in  the  figure  by  "1"  and  respectively.  Units  receiving  one  excitatory  input  came  on  with 
probability  .5;  these  units  are  represented  in  Figure  3  by  "?". 

The  use  of  the  SS  units  in  conjunction  with  these  particular  activation  assumptions 
means  that  the  input  representation  the  model  must  use  as  the  basis  for  assigning  words  to 
case  roles  is  both  noisy  and  redundant.  Hach  feature  of  the  input  is  represented  in  the  activa¬ 
tion  of  many  of  the  SS  units,  and  no  one  of  these  is  crucial  to  the  representation.  A  drawback 
of  these  particular  activation  assumptions,  however,  is  that  they  do  not  allow  the  model  to 
learn  to  respond  to  specific  conjunctions  of  inputs.  While  the  model  docs  well  in  our  present 
simulations,  we  presume  that  simulations  using  a  larger  lexicon  would  require  greater 
differentiation  of  some  of  the  noun  and  verb  representations.  To  handle  such  cases,  we  believe 
it  would  be  necessary  to  allow  tuning  of  the  input  connections  to  the  SS  units  via  back  propa¬ 
gation  (Chapter  8)  so  that  greater  differentiation  can  be  obtained  when  necessary.  In  princi¬ 
ple,  also,  higher-order  conjunctions  of  microfeatures  might  sometimes  be  required.  Our  use  of 
broadly  tuned,  pair-wise  conjunctive  units  illustrates  the  style  of  representation  that  we  think  is 
appropriate  for  the  input,  but  the  present  version  is  only  an  approximation  to  what  we  would 
expect  a  model  with  a  tunable  input  representation  to  build  for  itself. 

Case  role  representation.  The  case  role  representation  takes  a  slightly  different  form 
than  the  sentence-structure  representation.  To  understand  this  representation,  it  is  useful  to 
drop  back  to  a  more  abstract  viewpoint,  and  consider  more  generally  how  we  might  represent 
a  structural  description  in  a  distributed  representation.  In  general,  a  structural  description  can 
be  represented  by  a  set  of  triples  of  the  form  (A  R  B)  where  A  and  B  correspond  to  nodes  in 
the  structural  description,  and  R  stands  for  the  relation  between  the  nodes.  For  example,  a 
class-inclusion  hierarchy  can  be  represented  by  triples  of  the  form  (X  IS- A  V),  where  X  and  Y 
are  category  names.  Any  other  structural  description,  be  it  a  syntactic  constituent  strvicture.  a 
semantic  constituent  structure,  or  anything  else,  can  be  represented  in  just  this  way. 
Specifically,  the  case  role  assignment  of  the  constituents  of  the  sentence  The  hoy  broke  the  win¬ 
dow  with  the  hammer  can  be  represented  as: 

Broke  Agent  Boy 

Broke  Patient  Window 

Broke  Instrument  Hammer 

The  constituent  structure  of  a  sentence  such  as  The  hoy  ate  the  pasta  with  the  sam  e  would  be 
represented  by: 
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Ate  Agent  Boy 
Ate  Patient  Pasta 
Pasta  Modifier  Sauce 

In  a  localist  representation,  we  might  represent  each  of  these  triples  by  a  single  unit, 
liach  such  unit  would  then  represent  the  conjunction  of  a  particular  head  or  left-hand  side  of  a 
triple,  a  particular  relation,  and  a  particular  tail  or  right-hand  side.  Our  more  distributed 
approach  is  to  allocate  groups  of  units  to  stand  for  each  of  the  possible  relations  (or  roles), 
namely.  Agent,  Patient,  Instrument,  and  Modifier,  and  to  have  units  within  each  group  stand 
for  conjunctions  of  rrucrofeaturcs  of  the  first  and  third  arguments  (the  head  and  the  tail)  of  the 
triple.  Thus,  the  triple  is  represented  not  by  a  single  active  unit,  but  by  a  pattern  of  activation 
over  a  set  of  units. 

In  our  implementation,  there  is  a  group  of  units  for  each  of  the  four  relations  allowed  in 
the  case  structure.  In  Figure  3,  the  Agent,  Patient,  Instrument,  and  Modifier  groups  arc  laid 
out  from  left  to  right.  Within  each  group,  individual  units  stand  for  conjunctions  of  one 
microfcature  of  the  head  of  each  relation  with  a  microfcature  of  the  tail  of  each  relation. 
Thus,  for  example,  Brokc-Agcnt-Boy  is  represented  by  a  pattern  of  activation  over  the  left¬ 
most  square  block  of  units.  The  unit  in  the  / th  row  and  /th  column  stands  for  the  conjunction 
of  feature  /  of  the  verb  with  feature  j  of  the  noun.  Thus  all  the  units  with  the  same  verb 
feature  are  lined  up  together  on  the  same  row,  while  all  the  units  with  the  same  noun  feature 
arc  lined  up  together  in  the  same  column.  For  the  Modifier  group,  the  unit  in  the  /  th  row  and 
yth  column  stands  for  the  conjunction  of  feature  /  of  the  modified  \P  and  feature  j  of  the 
modifier  NP.  Tetters  indicating  the  dimension  specifications  of  the  units  are  provided  along 
the  side  and  bottom  edges. 

The  figure  indicates  the  net  input  to  each  case  role  unit  produced  at  the  end  of  the  train¬ 
ing  described  below,  in  response  to  the  sentence  The  boy  broke  the  window  with  the  hammer. 
(We  will  see  very  shortly  how  these  net  inputs  arc  produced.)  As  before,  a  1  indicates  that  the 
net  input  would  tend  to  turn  the  unit  on  with  probability  (p)  greater  than  or  equal  to  .85,  and 
a  indicates  that  the  net  input  would  tend  to  turn  it  on  with  probability  of  .15  or  less.  A 

”+  ”  indicates  that  the  net  input  has  a  tendency  to  turn  the  unit  on  (.85  >  p  >  .5),  and  a 
indicates  that  the  net  input  has  a  tendency  to  turn  the  unit  ofT(.5  >  p  >  15). 

The  correct  case-frame  interpretation  nf  the  sentence  is  provided  to  the  model  by  a 
specification  that  lists,  for  each  of  the  four  possible  case  roles,  the  label  corresponding  to  the 
head  and  tail  of  the  role.  These  are  shown  below  each  of  the  four  blocks  of  case  role  units. 
The  is  used  to  indicate  a  null  slot  filler,  as  in  the  Modifier  role  in  the  present  example. 
From  this  it  is  possible  to  compute  which  units  should  be  on  in  the  case  role  representation. 
Here  we  simply  assume  that  all  the  correct  conjunctions  should  be  turned  on  and  all  other 
units  should  be  off. 


I 
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In  this  example,  the  pattern  of  net  inputs  to  the  case  role  units  corresponds  quite  closely 
to  the  correct  case  role  representation  of  the  sentence.  The  features  of  boy  may  be  seen  in  the 
columns  of  the  block  of  Agent  units;  the  features  of  window  in  the  columns  of  the  block  of 
Patient  units;  and  the  features  of  hammer  in  the  columns  of  the  block  of  Instrument  units. 
The  features  of  the  Agent-Verb-Patient  reading  of  the  verb  broke  can  be  seen  in  the  rows  of 
each  of  these  three  sets  of  units.  There  arc  no  features  active  in  the  fourth  set  of  units,  the 
Modifier  units,  because  there  is  no  Modifier  in  this  case.  In  both  the  Agent  and  the  Patient 
slots,  the  model  tends  to  turn  on  (/>  -  5)  all  the  units  that  should  be  on,  and  tends  to  turn  otF 
(/>  <  .5)  all  the  units  that  should  be  oil'.  In  the  Instrument  slot,  there  arc  some  discrepancies; 
these  arc  indicated  b\  blackening  the  background  for  the  offending  units.  All  of  the  discrepan¬ 
cies  arc  relatively  mild  in  that  the  unit  has  either  a  weak  tendency  to  go  on  when  it  should  not 
( +•  on  a  black  background)  or  to  go  off  when  it  should  be  on  (  -  on  a  black  background). 

Several  things  should  be  said  about  the  case-frame  representations.  The  first  thing  is 
that  the  slots  should  not  be  seen  as  containing  lexical  items.  Rather,  they  should  be  seen  as 
containing  patterns  that  specify  some  of  the  semantic  properties  assigned  by  the  model  to  the 
entities  designated  by  the  words  in  the  sentences.  Thus,  the  pattern  of  feature  values  for  the 
verb  break  specifies  that  in  this  instance  there  is  contact  between  the  Instrument  and  the 
Patient.  This  would  also  be  the  case  in  a  sentence  like  The  hammer  broke  the  window.  How¬ 
ever,  in  a  sentence  like  The  boy  broke  the  window,  with  no  Instrument  specified,  the  pattern  of 
feature  values  specifies  contact  between  the  Agent  and  the  Patient.  Thus,  the  verb  features 
provide  a  partial  description  of  the  scenario  described  by  the  sentence.  The  noun  features, 
likewise,  provide  a  partial  description  of  the  players  (to  use  Fillmore’s  analogy)  in  the  scenario, 
and  these  descriptions,  as  we  will  see  later  on.  may  actually  be  modulated  by  the  model  to  take 
on  attributes  appropriate  for  the  scenario  in  question. 

Details  of  Sentence  Processing  and  Learning 

The  model  is  very  much  like  the  verb  learning  model  (Chapter  IS).  When  a  sentence  is 
presented,  a  conventional  computer  program  front-end  determines  the  net  input  to  each  of  the 
sentence-structure  units,  based  on  the  feature  vectors  of  the  words.  Fach  of  these  units  is  then 
turned  on  probabilistically,  as  described  above.  Fach  surface-structure  unit  has  a  modifiable 
connection  to  each  of  the  case-structure  units.  In  addition,  each  case-structure  unit  has  a 
modifiable  bias  (equivalent  to  a  connection  from  a  special  unit  that  is  always  on).  Based  on 
the  sentence-structure  pattern  and  the  current  values  of  the  weights,  a  net  input  to  each  case- 
structure  unit  is  computed;  this  is  just  the  sum  of  the  weights  of  the  active  inputs  to  each  unit 
plus  the  bias  term.  Case-structure  units  take  on  activation  values  of<>  and  1.  and  activation  is 
a  probabilistic  function  of  the  net  input,  as  in  the  verb  learning  model. 


During  learning,  the  resulting  activation  of  each  case-structure  unit  is  compared  to  the 
value  it  should  have  in  the  correct  reading  of  the  sentence.  The  correct  reading  is  supplied  as 
a  "teaching  input"  specifying  which  of  the  case  role  units  should  be  on.  The  idea  is  that  this 
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Sentence  f  rame 

Argument  Assignment 

T  he  human  ate. 

A  XT 

The  human  ate  the  food. 

A  VP 

l  he  human  ate  the  food  with  the  food. 

AV PM 

The  human  ate  the  food  with  the  utensil. 

AV  pi 

The  animal  ate. 

AVI 

1  he  predator  ate  the  prey. 

A  VP 

The  human  broke  the  fragile  object. 

A  VP 

lhe  human  broke  the  fragile  object  with  the  breaker. 

AVPI 

The  breaker  broke  the  fragile  object. 

IVP 

Die  animal  broke  the  fragileohject. 

A  VP 

The  fragile  ohject  broke 

PV 

lhe  human  hit  the  thing 

A  VP 

l  he  human  hit  the  human  with  the  possession. 

AVPM 

The  human  hit  the  thing  with  the  hitter 

AVPI 

l  he  hitter  hit  the  thing. 

IVP 

The  human  moved. 

A  VS 

The  human  moved  the  object. 

A  VP 

The  animal  moved. 

A  VS 

The  object  moved. 

PV 

Nole:  Argument  assignments  speufy  the  case  role  assignment  of  Lhe  constituents  of 
a  sentence  from  left  to  right  A  -  Agent,  V  =  S  erb.  P  =  Patient.  I  =  Instrument, 
\1  =  Modifier,  I  =  (implied)  f  ood,  8  =  Self. 


teaching  input  is  analogous  to  the  representation  a  real  language  learner  would  construct  of 
the  situation  in  which  the  sentence  might  have  occurred.  I. earning  simply  amounts  to  adjust¬ 
ing  connection  strengths  to  make  the  output  generated  by  the  model  correspond  more  closely 
to  the  teaching  input.  As  in  the  verb  learning  model,  if  a  unit  should  be  active  and  it  is  not, 
the  weights  on  all  the  active  input  lines  are  incremented  and  the  threshold  is  decremented.  If  a 
unit  should  not  be  active  but  it  is,  the  weights  on  all  the  active  output  lines  are  decremented 
and  the  threshold  is  incremented.  This  is,  of  course,  just  the  perceptron  convergence  pro¬ 
cedure  (Rosenblatt,  1962),  whose  strengths  and  weaknesses  have  been  examined  and  relied 


I 
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upon  throughout  the  book. 


SIMULATION  EXPERIMENTS  i 

i 

t 

The  most  important  thing  about  the  model  is  the  fact  that  its  response  to  new  inputs  is  : 

strictly  dependent  upon  its  experience.  In  evaluating  its  behavior,  then,  it  is  important  to  have  * 

a  clear  understanding  of  what  it  has  been  exposed  to  during  learning.  We  have  done  a 
number  of  different  experiments  with  the  model,  but  we  will  focus  primarily  on  one  main 
experiment. 

fhe  main  experiment  consisted  of  generating  a  corpus  of  sentences  derived  from  the  sen-  | 

fence  frames  listed  in  Table  2.  It  must  be  emphasized  that  these  sentence  frames  were  simply 
used  to  generate  a  set  of  legal  sentences,  liach  frame  specifics  a  verb,  a  set  of  roles,  and  a  list 
of  possible  fillers  of  each  role.  Thus,  the  sentence  frame  The  human  broke  the  fragile _object 
with  the  breaker  is  simply  a  generator  for  all  the  sentences  in  which  human  is  replaced  with  one 
of  the  words  on  the  list  of  humans  in  Table  3 ,  fragile _object  is  replaced  with  one  of  the  words  j 

on  the  list  of  fragile  objects  in  Table  3,  and  breaker  is  replaced  with  one  of  the  words  on  the 
list  of  breakers  in  the  table.  It  is  clear  that  these  generators  do  not  capture  all  of  the  subtle 
distributional  properties  of  referents  in  real  scenarios  (e.g.,  the  model  is  completely  sex  and  age 
neutral  when  it  comes  to  hitting  and  breaking  things,  contrary  to  reality),  and  so  we  cannot 
expect  the  model  to  capture  all  these  subtleties.  However,  there  arc  certain  distributional  facts 
implicit  in  the  full  ensemble  of  sentences  encompassed  by  the  generators.  For  example,  all  the  I 

breakers  but  one  are  hard,  not  soft  (only  ball  is  coded  as  soft  in  the  feature  patterns);  only  the 
humans  enter  as  Agents  into  scenarios  involving  Instrument  use;  etc. 

The  "target"  case-frame  representations  of  the  sentences  were  generated  along  with  the 
sentences  themselves.  The  case  role  assignments  are  indicated  in  Table  2  by  the  sequence  of  | 

capital  letters.  These  indicate  the  assignment  of  arguments  from  the  sentences  to  the  roles  of 
Agent,  Verb,  Patient,  Instrument,  and  Modifier  (of  the  Patient).  5  Note  that  there  are  some 
sentences  that  could  be  generated  by  more  than  one  generator.  Thus,  The  boy  hit  the  girl  with 
the  ball  can  be  generated  by  the  generator  The  human  hit  the  human  with  the  possession,  in 
which  case  the  ball  is  treated  as  a  Modifier  of  the  Patient.  Alternatively,  it  may  be  generated  . 

by  the  generator  The  human  hit  the  thing  with  the  hitter.  In  this  case,  the  ball  is  treated  as  the 
Instrument.  Similarly,  The  bat  broke  the  vase  can  be  generated  by  The  breaker  broke  the 
fragile _ohject,  in  which  case  its  case-frame  representation  contains  a  baseball  bat  serving  as 
Instrument,  fhe  same  sentence  can  also  be  generated  b\  The  animal  broke  the  fragile  object. 


5.  Two  spcci.il  cases  should  he  noted:  f  or  Vhe  human  aie.  the  case  frame  contains  a  specification  (I  )  that  desig¬ 
nates  an  implied  Patient  that  is  the  generic  food  with  unspecified  shape,  as  indicated  in  the  feature  patterns  displayed 
in  f  igure  I.  f  or  the  human  moved  and  Fhe  animat  moved,  the  case  frame  contains  a  specification  (S)  that  indicates 
that  there  is  an  implied  Patient  who  is  the  same  as  the  \gent  (note  dial  the  sense  of  the  serb  move  used  here  in¬ 
volves  moving  oneself  and  not  one’s  possessions). 
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in  which  case,  of  course,  its  case-frame  representation  contains  a  flying  hat  serving  as  Agent. 

l  or  the  main  experiment,  wc  generated  all  the  sentences  covered  by  the  generators  and 
then  selected  eight  of  each  type  to  use  as  training  sentences.  Of  these  we  selected  two  to  be 
familiar  test  sentences.  In  addition,  we  selected  two  additional  sentences  from  each  generator 
to  be  used  as  novel  test  sentences.  These  sentences  were  never  used  to  train  the  model.1’ 

The  model  was  given  50  cycles  of  training  with  the  set  of  training  sentences.  On  each 
cycle,  each  sentence  was  presented,  the  model’s  response  to  it  was  generated,  and  connection 
strengths  were  adjusted  according  to  the  perceptron  convergence  procedure. 

Table  3 

NOL  N  CAI  IX'.ORO  S 


human 

animal 

object 

thin 

predator 

prey 

food 

utensil 

Iraglle  object 
hitter 
breaker 
possession 


man  woman  boy  girl 

0  hat  li  chicken  dog  wolf  sheep  lion 

hall  bh  hat  paperwt  cheese  co  chicken  curtain 
desk  doll  fork  hatchet  hammer  plate  rock 
pasta  spoon  carrot  vase  window 

human  animal  object 

wolf  lion 

li  chicken  sheep 

co-chicken  cheese  spaghetti  carrot 

fork  spoon 

plate  window  vase 

ball  bh  bat  paperwt  hatchet  hammer  rock  vase 
paperwt  ball  bh  bat  hatchet  hammer  rock 
hall  dog  hb  hat  doll  hatchcl  hammer  vase 


b  Some  of  the  generators  (e  g  .  I  he  hum, in  hi;  'he  :him  with  the  hitter\  generate  rather  large  numbers  of  different 
sentences  (in  this  case,  ~>b).  but  others  te  g  .  I  he  human  are.  I  he  predator  ate  the  prey)  generate  onlv  verv  small 
numbers  ol  sentences  (four  in  e.nh  ol  these  cases i  I  he  training  materials  contained  four  copies  of  each  of  two  of 
these  sentences  so  that  even  here,  there  were  two  familiar  test  sentences  and  two  unfamiliar  lest  sentences. 


McClelland  and  kawamoto 
.April  30.  1986 


Sentence  Processing 

21 


After  the  5th,  10th.  20th,  30th,  40th,  and  50th  training  cycles,  the  model  was  tested  on 
both  the  familiar  and  the  novel  test  sentences.  No  learning  occurred  during  the  tests,  so  that 
the  response  to  the  novel  test  sentences  always  represented  generalization  from  the  training 
materials  rather  than  the  effects  of  direct  experience. 

Basic  Results 

Figure  4  gives  a  very  global  indication  of  the  model’s  performance  on  both  the  familiar 
and  the  novel  test  sentences  at  each  phase  of  testing.  The  figure  indicates  the  average  number 
of  incorrect  rrucrofcatures  produced  by  the  model  as  aTunction  of  learning  trials,  for  both  the 
familiar  and  unfamiliar  test  sentences.  There  are  a  total  of  2500  case  role  units,  so  on  the 
average  the  model  is  getting  over  95° o  correct,  even  with  unfamiliar  sentences,  after  the  first  5 
learning  cycles,  and  is  down  to  about  1%  error  at  Cycle  50.  However,  these  statistics  are 
somewhat  misleading,  since  on  the  whole,  most  of  the  case  role  units  should  be  off.  A  more 
realistic  indication  of  the  absolute  performance  level  is  provided  by  the  observation  that 
between  56  and  168  of  the  units  should  be  on  in  the  correct  case  role  representation  of  each 
sentence.  In  general,  the  errors  that  the  model  does  make  are  about  evenly  distributed 
between  sins  of  commission  (false  alarms)  and  sins  of  omission  (incorrect  rejections).  Thus, 
on  average,  at  the  end  of  50  cycles,  the  model  is  turning  on  about  S5  of  the  approximately  loo 
microfeatures  that  should  be  on.  and  is  turning  on  about  15  out  of  2400  microfeatures  that 
should  be  off.  This  corresponds  to  a  d  of  about  3.5. 


f  igure  4  \verage  number  of  incorrect  microfealurcs  produced  as  a  function  oi Amount  of  learning  expcrieiu  e 
(in  number  of  cvcles  Ihroueh  ihe  full  list  ol  trainitie  sentences). 
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Two  things  arc  apparent  in  the  graph,  hirst,  there  is  a  smooth  and  continuing  improve¬ 
ment  m  performance  for  both  familiar  and  unfamiliar  sentences.  Second,  there  is  an  advan¬ 
tage  for  sentences  that  the  model  has  actually  seen  before,  but  it  is  not  particuiarlv  great.  As 
we  shall  see.  there  is  considerable  variability  in  the  model's  ability  to  deal  with  particular  sen¬ 
tences;  only  a  part  of  it  is  accounted  for  by  whether  a  particular  sentence  happens  to  be  old  or 
new. 


Figures  5  and  o  indicate  the  model's  performance  on  each  of  the  38  familiar  and  unfami¬ 
liar  test  sentences,  at  each  of  the  six  tests. 


“MICifl  SENTENCES 
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Hgure  5  Number  of  mierofe.iture  errors  lor  each  familiar  sentence,  alter  5.  11).  21).  11).  41).  and  50  odes 
through  the  set  of  training  stimuli.  Sentences  are  written  in  SYOW  i\V  =  with  \Pt  order  to  facilitate  reading:  the 
column  labeled  "V  erb  F  rame"  indicates  the  role  assignments  of  these  arguments.  Hie  "  indicates  a  sentence  that  is 
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In  what  follows,  we  will  focus  attention  on  the  performance  of  the  model  at  the  end  of 
50  learning  cycles. 

L'se  of  Semantic  and  Word-Order  Cues 

To  assess  the  model's  ability  to  make  use  of  word-order  and  semantic  cues  to  role  assign¬ 
ment.  we  examined  its  performance  on  the  verbs  hit  and  break.  In  the  case  of  break,  con¬ 
straints  are  ver\  important;  in  the  case  of  hit,  role  assignment  is  sometimes  determined  bv 
word  order  alone. 

The  dog  broke  the  plate,  the  hammer  broke  the  vase,  and  the  window  broke.  The 
verb  break  can  take  its  Instrument,  its  Agent,  or  even  its  Patient  as  its  Subject.  In  the  first 
two  cases  (as  in  The  dog  broke  the  plate  and  Fhe  hammer  broke  the  vase i,  it  is  only  the  seman¬ 
tic  properties  of  the  subject  that  can  tell  us  whether  it  is  Agent  or  Instrument. 
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1'hese  two  sentences  happened  to  be  among  the  unfamiliar  test  sentences  generated  by 
the  generators  The  animal  broke  the  fragile _object  and  The  breaker  broke  the  fragile  _obieci. 
The  patterns  of  activity  produced  by  the  model  in  response  to  each  of  these  two  sentences  are 
shown  m  Figures  7  and  8.  We  can  see  that  in  the  case  of  The  Jog  broke  the  plate,  there  is  a 
strong  pattern  of  activation  over  the  Agent  units,  while  in  the  case  of  The  hammer  broke  the 
.use,  there  is  a  strong  pattern  of  activation  over  the  Instrument  units.  In  fact,  the  pattern  of 
activation  over  the  Agent  units  in  the  first  case  corresponds  closely  to  the  expected  target  pat¬ 
tern  for  brokeA VP-agent-dog.  and  the  pattern  of  activation  over  the  Patient  units  in  the 
second  case  corresponds  closely  to  the  expected  target  pattern  for  brokel VP-instrument- 
hammer.  Thus,  the  model  has  correctly  assigned  the  word  Jog  to  be  Agent  in  the  first  case. 


I  ieurc  ~.  Display  of  the  stale  of  the  model  alter  processing  the  sentence  /  V  Jcc  br-\h.e  :he  :\a.v 

I  he  others  were  The  hot  broke  the  v,ise  and  l"ne  boil  broke  the  .-,:.\e 
later 


Kich  of  thoM?  sentences  will  he  discussed 
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and  the  word  hammer  to  be  instrument  in  the  second.  A  summary  display  that  makes  the 
same  point  is  shown  below  the  case  role  representations  in  Figures  7  and  S.  and  is  repeated, 
for  these  and  other  sentences,  in  Figure  9. 

The  summary  display  indicates  in  histogram  form  the  features  of  the  nght-hand  side  of 
the  triple  stored  in  the  corresponding  block  of  units.  The  summary  pattern  for  the  Agent  role 
in  the  sentence  The  dog  broke  the  plate  is  shown  below  the  block  of  Agent  units  in  Figure  ", 
The  pattern  just  below  this  indicates  the  ’’correct”  pattern;  in  this  instance,  this  pattern  is  just 
the  feature  pattern  for  dog.  In  this  case  we  can  see  that  the  pattern  the  model  produced  is 
very  close  to  the  correct  answer. 5 


»■»..»  .  i  .  i 

’  7  .  *  ,.7  ...  V. 

1 7 1  r»-»  t  ■»-»  i  ■»•»•» ->■»  t  ?•> 


f  D®o©oo©oo«a<»  *v»tJ9*jvvvf  f*  tu^oocooooo 


I  igure  X.  Display  of  the  state  of  the  model  .liter  processing  the  sentence  fhe  hammer  *>*•,<*«•  the  .  ..a. 


X.  The  summaries  are  based  on  average  probability  of  activation,  with  the  average  onlc  taken  over  the  subset  of  tin 
us  associated  with  a  particular  noun  feature  that  is  also  associated  with  the  features  ol  the  correct  verb  frame  thus, 
lor  example,  there  are  25  units  for  the  feature  nonhuman  in  each  of  the  four  case  slots.  Ihcsc  units  are  displaced  as 
the  second  column  of  units  in  each  slot;  the  ones  that  indicate  whether  tins  feature  holds  for  the  \ cent  are  shown  in 
the  second  column  ol  the  Agent  units,  the  lett  most  block  Seven  of  these  units  conjoin  tills  feature  with  a  leuttire  ■  '! 
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f igure  9  Summary  of  the  activations  produced  on  the  case  role  units  by  the  sentences  The  Uo g  broke  the 
plate,  t  he  hammer  broke  the  ease,  and  The  plate  broke. 


Below  each  feature-pattern  summary,  we  present  in  quantitative  form  a  measure  of  the 
similarity  of  all  possible  nouns  from  the  sentence  with  the  summary  pattern  for  each  slot.  We 
also  compare  the  contents  of  each  slot  to  the  ’V  pattern,  corresponding  to  "all  units  oil'." 
The  numbers  (called  deviation  scores)  indicate  the  mean  squared  deviation  of  the  indicated 
noun  pattern  from  the  summary  representation  of  the  contents  of  each  slot.  A  small  deviation 
score  indicates  a  close  correspondence.  Thus,  if  we  look,  at  the  row  of  numbers  labeled  dog. 
we  can  see  again  that  the  pattern  over  the  Agent  units  (first  column)  is  very  similar  to  the  pat¬ 
tern  for  Jog.  Further,  we  can  see  that  the  pattern  lor  dog  is  not  similar  to  the  patterns  in  any 
of  the  other  slots  (second  through  fourth  columns).  The  pattern  on  the  Patient  units  is  similar 
to  the  pattern  for  plate,  and  the  patterns  on  the  Instrument  and  Modifier  units  are  similar  to 
the  null  pattern.  For  the  sentence  Hie  hammer  broke  the  vase,  the  pattern  on  the  Instrument 
units  is  similar  to  the  pattern  for  hammer,  the  pattern  on  the  Patient  units  is  similar  to  the 
pattern  for  ruje,  and  the  patterns  on  the  Agent  and  Modifier  units  are  similar  to  the  blank 
pattern.  Thus,  each  argument  has  effectively  been  assigned  the  correct  role,  and  each  role  has 
elTectiveiy  been  assigned  the  correct  filler. g 


the  verb  frame  brokcAVP  In  ihe  case  of  The  doe  broke  the  plate,  illustrated  in  figure  '.  most  >1  these  units  have  a 
tendency  to  come  on  (.5  <  p  <  .85),  while  two  (the  last  two)  have  a  tendency  to  stay  .<tf  15  ■  p  ■  >l  -  a  ten 
Jenc.  that  is  considered  to  be  erroneous  and  results  in  these  units  having  black  backgrounds  t  he  average  of  these 
probabilities  is  about  .65.  This  value  is  translated  into  the  height  of  the  bar  in  the  second  position  of  the  Agent  slot 
for  the  sentence  The  Jog  broke  the  plate. 

9  With  respect  to  the  summary  pattern  on  the  Instrument  slot  for  The  hammer  broke  the  xtiao ic.  vve  can  see  that 
the  model  is  trying  to  activate  both  values  on  the  I’OIN  I  INI  SS  dimension  Because  hammer  and  haiehei  are  idem 
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This  example  has  been  described  in  some  detail,  in  part  to  explicate  our  displays  of  the 
model's  responses  to  particular  sentences.  In  the  process  we  have  seen  clearly  that  dog  and 
hammer  trigger  the  appropriate  bindings  of  arguments  to  slots.  This  is  also  the  case  for  exam¬ 
ples  of  the  form  The  plate  broke.  There,  Figure  9  indicates  that  the  only  slot  in  which  there  is 
appreciable  activity  is  the  Patient  slot. 

These  examples  illustrate  that  the  model  has  learned  quite  a  bit  about  assigning  fillers  to 
slots  on  the  basis  of  the  microfcaturcs  of  the  slot  fillers  involved.  For  the  word  break,  animate 
surface  subjects  are  treated  as  Agents  and  inanimate  surface  subjects  arc  treated  as  Instru¬ 
ments  if  an  Object  is  specified;  if  not,  the  inanimate  surface  subject  is  treated  as  Patient.  The 
model  seems  to  capture  this  fact  pretty  well  in  its  behavior!0 

The  hoy  hit  the  girl  and  the  girl  hit  the  boy.  For  the  verb  hit ,  there  is  a  possibility 
that  a  sentence  describing  an  instance  of  hitting  will  have  two  animate  arguments,  which  may 
be  equally  plausible  candidates  to  serve  as  Agent.  The  only  way  to  tell  which  is  the  Agent  (in 
the  absence  of  other  context)  is  to  rely  on  word-order  information.  We  know  that  boy  is  the 
Agent  in  the  The  boy  hit  the  girl  only  because  it  occurs  in  the  prevcrbal  position.  The  model 
has  no  difficulty  coping  with  this  fact.  Figure  10  shows  in  summary  form  the  features 
activated  by  the  sentences  The  girl  hit  the  boy  (a  sentence  the  model  actually  experienced  dur¬ 
ing  learning)  and  The  boy  hit  the  girl  (a  sentence  not  experienced  during  learning).  In  both 
cases,  the  model  activates  the  feature  pattern  for  the  correct  argument  in  the  correct  slot.  This 
is  so.  even  though  the  feature  patterns  for  boy  and  girl  differ  by  a  single  feature."  As  a  more 
formal  test  of  the  model’s  ability  to  assign  slot  fillers  to  slots  based  only  on  sentence-structure 
information,  we  tested  the  model  on  the  full  set  of  12  different  human-hit-human  sentences.  In 
all  cases,  the  prevcrbal  argument  more  closely  matched  the  pattern  of  activation  in  the  Agent 


ical  except  for  this  feature,  the  representation  is  really  describing  a  somewhat  pointy  hammer  (or  perhaps  blunt 
hatchet). 

10.  One  thing  that  we  see  illustrated  here  is  that  with  certain  novel  sentences,  the  model  may  have  a  tendency  to 
misgenerate  some  of  their  features  when  assigning  them  to  underlying  slots.  Thus,  for  example,  in  the  case  of  t  he 
plate  broke,  only  some  of  the  features  are  produced  faithfully.  These  are.  in  fact,  (he  features  associated  with  the 
slot  fillers  that  the  model  actually  learned  to  deal  with  in  this  sentence  frame  (wise,  wirtJow).  The  ones  til  at  are 
poorly  reproduced  arc  the  features  that  the  familiar  exemplars  differ  on  or  which  dilfer  between  the  familiar  exam¬ 
ples  and  plate.  Such  errors  would  be  greatly  reduced  if  a  more  disparate  range  of  Patient  intransitive  verbs  with  a 
more  disparate  range  of  subjects  had  been  used  in  the  learning.  Such  errors  could  also  be  cleaned  up  quite  a  bit  bv 
an  auto-associative  network  of  connections  among  the  case  role  units.  The  virtues  of  augmenting  the  model  with 
such  a  network  are  considered  in  more  detail  later. 

11.  Though  the  model  did  not  actually  learn  the  sentence  The  boy  hit  the  girl  or  am  other  sentence  containing  bo\ 
and  girl  as  Subject  and  Object,  it  did  learn  several  sentences  in  which  boy  was  die  subject  of  hit  and  several  others 
in  which  girl  was  the  object.  As  it  happened,  several  of  these  involved  modifiers  of  girl,  hence  the  rather  diffuse  pal 
tern  of  activation  over  the  Modifier  units. 
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Figure  10.  Summaries 


the  role  assignment  patterns  produced  b>  The  $irt  hit  the  boy  and  The  bov  hit  the 
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role,  and  the  postverbal  argument  more  closely  matched  the  pattern  in  the  Patient  role. 
I'erh-Frame  Selection 

Part  of  the  task  of  the  model  is  to  put  slot  fillers  in  the  correct  places,  but  there  is  more 
that  it  must  do  than  this.  It  must  also  determine  from  the  ensemble  of  arguments  what  read¬ 
ing  of  the  verb  is  intended.  By  this,  we  mean  which  of  the  possible  scenarios  the  verb  might 
describe  is  actually  being  described  in  this  -  trticular  case.  For  example,  the  sentences  The  hoy 
broke  the  window  with  the  hammer  generates  quite  a  dilTerent  mental  scenario  than  The  Jog 
broke  the  window  or  The  window  broke.  Our  model  captures  the  differences  between  these 
scenarios  in  two  ways:  one  is  simply  in  terms  of  the  set  of  underlying  roles  and  the  assignment 
of  sentence  constituents  to  these  roles.  We  have  already  seen  in  our  earlier  discussion  o ['break 
that  the  model  produced  a  different  set  of  slots  and  assigned  the  prcverbal  noun  phrase  to  a 
different  role  in  each  of  the  sentences  T'.e  dot;  broke  the  window.  The  hammer  broke  the  vase, 
and  The  plate  broke.  The  other  way  the  model  captures  the  differences  between  scenarios  is  in 
terms  of  the  pattern  of  activation  of  verb  features  in  the  Agent,  Patient,  and  Instrument  slots. 
Thus  in  The  hoy  hit  the  girl,  we  visualize  physical  contact  between  the  boy  and  the  girl;  in  the 
case  of  The  boy  hit  the  girl  with  the  rock,  we  visuali/e  physical  contact  between  the  rock  and 
the  girl.  These  and  other  related  distinctions  arc  captured  tudmittedlv.  imperfectly!  in  the 
dilferent  feature  patterns  associated  with  the  verb  frames.  As  with  the  feature  patterns  for 
nouns,  the  patterns  that  we  used  do  not  adequately  encode  the  differential  flexibility  of 
different  aspects  of  the  different  scenarios.  Nevertheless  they  capture  the  essence  of  the 
difference,  say,  between  one  or  the  other  kind  of  hitting. 

The  features  of  the  scenario  are  captured  in  the  pattern  of  activation  of  the  case  role 
units.  To  this  point,  we  have  been  summing  along  columns  of  units  to  determine  the  features 
of  the  object  assigned  to  a  particular  role  by  the  model.  To  determine  the  features  of  the 
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action  or  scenario  assigned  to  the  sentence,  vve  need  to  look  along  the  rows.  Figures  3,  7,  and 
8  indicate  that  features  in  somewhat  difTercnt  patterns  of  rows  are  activated  by  the  sentences 
The  boy  broke  the  window  with  the  hammer ,  The  dog  broke  the  plate,  and  The  hammer  broke  the 
vase.  These  are,  indeed,  the  correct  verb  features  in  each  one.  Thus,  we  see  that  the  model 
has  learned  to  successfully  assign  a  different  scenario,  depending  on  the  arguments  supplied 
along  with  the  verb. 

In  general,  the  model  did  quite  well  selecting  the  correct  scenario.  For  every  unambigu¬ 
ous  test  sentence,  familiar  or  unfamiliar,  the  value  on  each  scenario  dimension  that  was  most 
active  was  the  correct  value. 

Filling  in  Missing  Arguments 

The  model  does  a  very  good  job  filling  in  plausible  default  values  for  missing  arguments. 
To  demonstrate  this,  we  tested  the  model  on  the  sentence  fragment  The  boy  broke.  The  results 
are  shown  in  Figure  11.  The  model  fills  in,  fairly  strongly,  a  plausible  but  underspecified  fra¬ 
gile  object  —  something  that  is  nonhuman,  neuter,  nonpomted,  fragile,  and  has  object-type  fur¬ 
niture  {plate ,  vase,  and  window  are  all  classified  as  furniture  in  tnc  model).  Values  on  the  si/e 
(VOLUME)  and  shape  (FORM)  dimensions  are  very  weakly  specified.12 

We  see  similar  kinds  of  things  happening  with  The  girl  ate,  though  in  this  case,  the 
model  is  actually  taught  to  fill  in  food  of  unspecified  form,  so  this  performance  is  not  surpris¬ 
ing.  Something  slightly  different  happens  with  The  man  moved'.  The  model  is  taught  to  treat 
man  as  boil  Agent  and  Patient  in  such  sentences;  and  indeed,  the  pattern  for  man  predom¬ 
inates  in  the  Patient  slot.  (Note  that  if  an  object  is  specified,  as  in  The  man  moved  the  piano,  it 
is  handled  correctly.) 

These  additional  examples  make  two  points:  First,  that  the  model  can  be  explicitly 
trained  to  fill  in  implied  arguments;  and  second,  that  the  filling  in  of  implied  arguments  is 
clearly  specific  to  the  particular  verb.  What  is  filled  in  on  the  Patient  slot  is  quite  different  for 
each  of  these  three  examples. 


12.  While  the  model’s  response  to  The  boy  broke  clearly  illustrates  default  assignment,  il  differs  from  the  way  many 
people  appear  to  process  this  input;  several  people  have  commented  to  us  dial  they  read  this  fragment  as  a  complete 
sentence  specifying  that  it  was  the  boy  dial  broke,  even  though  this  produces  a  somewhat  anomalous  reading.  The 
model's  response  to  The  boy  broke  is  closer  to  the  kind  of  thing  most  people  get  with  a  verb  like  hit,  which  does  not 
have  an  intransitive  reading. 
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Figure  1  1.  Summaries  of  ihe  model  s  response  to  The  boy  broke.  The  girl  ate.  and  The  man  moved. 


Lexical  Ambiguity  Resolution 

In  general,  the  model  does  very  well  with  ambiguous  words.  That  is.  it  has  little 
difficulty  determining  which  reading  to  assign  to  an  ambiguous  word  based  on  its  context  of 
occurrence  -  as  long  as  the  context  is  itself  sufficient  to  disambiguate  the  meaning  of  the 
word. 


To  demonstrate  this  point,  we  carried  out  a  number  of  analyses  of  the  model's  responses 
to  sentences  containing  the  ambiguous  nouns  chicken  (live  or  cooked)  and  hai  (fixing  or  base¬ 
ball).  We  divided  the  sentences  containing  these  ambiguous  words  into  those  that  had  onl\ 
one  case-frame  representation  derivable  from  the  generators  and  those  that  could  have  been 
generated  in  two  different  ways.  An  example  of  the  former  kind  of  sentence  is  l he  chicken  are 
ihe  carrot,  since  only  a  live  chicken  could  occur  as  the  preverbal  \P  with  ale.  An  example  of' 
the  latter  kind  of  sentence  is  The  hat  broke  the  window,  which  could  be  generated  either  from 
The  animal  broke  the  fragile  _ohject  or  from  The  breaker  broke  ihe  fragile  _"bjecl.  Given  the  wt 
of  generators,  the  context  specifics  which  reading  is  correct  for  the  first  kind  of'  sentence  but 
does  not  specify  which  reading  is  correct  for  the  latter.  Since  our  present  interest  is  to  see  how 
well  the  model  can  do  at  using  context  to  resolve  ambiguity,  we  focus  here  on  the  former  tv  pc. 

Our  first  analysis  simply  compared  the  model’s  performance  with  these  sentences  to  its 
performance  with  sentences  containing  no  ambiguous  words.  Since  a  large  number  of  factors 
contribute  to  the  number-of-bits-incorrcct  measure,  we  focused  on  a  set  of  seven  matched 
pairs  of  sentences  that  were  generated  from  the  same  generators  and  were  also  either  both  old 


i)  o 


McClelland  and  Kawamoto 
April  30.  198t> 


Sentence  Processing 

31 


or  both  new,  but  differed  in  that  one  contained  an  ambiguous  word  (e.g..  The  -.vermin  aie  the 
chicken  with  the  carrot )  and  the  other  did  not  (  The  boy  ate  the  carrot  with  the  pasta ).  The  aver¬ 
age  number  of  features  missed  was  28.4  for  the  items  containing  ambiguous  words  and  29.1 
for  the  control  items,  f(  1,6)  <  1. 

Another  way  to  examine  the  data  is  to  examine  the  relative  strengths  of  the  features  of 
the  two  readings  of  an  ambiguous  word  in  the  output  summary  representation.  Figure  i2 
indicates  a  typical  case  and  one  interesting  exception  to  the  general  pattern.  In  the  typical 
case  i  I  he  man  ate  the  chicken  with  the  fork),  we  see  that  the  features  of  the  cooked  chicken 
have  been  strongly  activated  in  the  Patient  slot,  and  though  there  arc  some  weak  traces  of  the 
features  of  the  live  chicken,  they  are  not  stronger  than  the  weak  extraneous  activation  we 
often  see  of  incorrect  features  for  unambiguous  words.  In  the  atxpical  case  t  The  wolf  ate  the 
chicken),  the  pattern  is  really  much  closer  to  the  cooked  chicken  than  the  live  one  that  the 
model  was  ’’supposed"  to  have  retrieved.  It  is  difficult  to  fault  the  model  too  much  in  this 
case,  however.  Though  it  was  never  given  sentences  of  the  form  The  animal  ate  the  food,  it 
was  given  sentences  of  the  form  The  animal  ate.  where  the  underlying  case  frame  included 
i  implied)  food.  Thus  the  model  had  considerable  reason  to  treat  the  object  of  The  animal  ate 
as  food.  Though  it  had  learned  The  lion  ate  chicken  referred  to  a  live  chicken,  it  appears  to 
prefer  to  treat  chicken  in  The  wolf  ate  chicken  more  as  cooked  food  than  as  a  living  animal. 
Some  of  the  properties  of  the  live  chicken  are  weakly  present  --  e.g.,  its  female  sex  and  its 
natural,  animate  object-type  classification  --  but  the  predominant  pattern  is  that  of  food. 

With  this  exception,  the  model  has  shown  itself  quite  capable  of  handling  sentences  con¬ 
taining  ambiguous  words  as  well  as  sentences  containing  only  unambiguous  words.  This  is.  of 
course,  not  really  surprising  in  view  of  the  fact  that  all  the  words  in  the  sentence  are  used  to 
help  constrain  the  features  assigned  to  the  fiiiers  of  every  role  in  the  case  frame.  When  one 
word  does  not  provide  the  information  to  constrain  the  output,  the  model  can  exploit 
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information  contained  in  other  words. 

Structural  Ambiguity 

In  this  section,  we  briefly  consider  another  type  of  ambiguity  that  sometimes  arises  even 
when  the  words  in  the  sentence  are  unambiguous.  This  is  the  structural  ambiguity  of  sen¬ 
tences  such  as  The  man  hit  the  woman  with  the  hammer.  In  such  cases,  hammer  can  cither  be 
the  Instrument  or  simply  a  Modifier  of  woman.  This  case-structure  ambiguity  parallels  an 
ambiguity  in  the  syntactic  structure  of  the  sentence  as  well;  if  hammer  is  an  Instrument,  it  is 
dominated  directly  by  the  VP;  whereas  if  it  is  a  Modifier,  it  is  dominated  by  the  NP  the  win¬ 
dow.  Because  hammer  was  included  both  in  the  list  of  possible  possession-modifiers  of  human 
objects  and  in  the  list  of  possible  instruments  of  hitting  (designated  as  possessions  and  hitters 
in  the  sentence  frames  in  Table  2  and  in  the  noun  categories  in  Table  3).  either  of  these  read¬ 
ings  is  equally  possible.  Thus,  it  is  not  so  surprising  that  the  model  has  considerable  difficulty 
with  such  sentences,  generating  a  blend  of  the  two  readings. 

A  particularly  interesting  case  of  case-structure  ambiguity  occurs  with  the  sentence  The 
bat  broke  the  window.  As  already  mentioned,  the  sentence  is  both  lexically  ambiguous  and 
ambiguous  in  case  structure,  and  the  structural  ambiguity  hinges  on  the  lexical  ambiguity.  If 
bat  is  a  baseball  bat  then  the  case  frame  specifies  an  Instrument  and  a  Patient,  but  if  it  is  a 
fixing  bat  then  the  case  frame  specifies  an  Agent  and  a  Patient. 

Figure  13  illustrates  the  pattern  of  activation  generated  in  this  case.  What  the  model 
docs,  quite  sensibly,  is  activate  one  kind  of  bat  —  the  flying  bat  --  on  the  Agent  units  and  the 
other  —  the  baseball  bat  —  on  the  Instrument  units  (sec  the  figure  caption  for  a  detailed  expla¬ 
nation  of  the  use  of  the  black  background  in  this  figure). 

People  generally  get  only  one  reading  of  a  sentence  at  a  time,  even  when  (as  in  this  case) 
the  sentence  is  easy  to  interpret  in  cither  of  two  ways.  In  a  later  section  of  this  chapter  we 
explain  how  cross-connections  among  the  case  role  units  and  back-connections  to  sentence- 
level  units  would  tend  to  cause  the  model  to  choose  a  single  interpretation,  even  in  these 
ambiguous  cases. 

Shades  of  Meaning 

Another  property  of  the  model,  related  to  its  handling  of  ambiguity,  is  the  fact  that  it 
can  shade  the  feature  patterns  it  assigns,  in  ways  that  often  seem  quite  appropriate.  One 
example  of  this  tendency  arises  in  the  model's  response  to  the  sentence  The  hall  broke  the  vase. 
A  summary  of  the  pattern  produced  is  shown  in  Figure  M.  The  feature  pattern  on  the  Instru¬ 
ment  units  matches  the  features  of  ball  fairly  closely.  The  value  on  each  dimension  that  is 
most  strongly  activated  is  the  correct  value,  except  on  one  dimension  —  the  hard  soft  dimen¬ 
sion.  On  this  dimension,  we  can  see  that  the  model  has  gotten  ball  completely  wrong  --  it  has 
strongly  activated  hard,  instead  of  soft. 
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F  igure  13.  Stale  of  the  model  alter  processing  the  sentence  The  bat  broke  the  window.  F  or  tins  sentence,  the 
black  background  on  the  Agent  units  shows  which  units  would  be  active  in  the  pattern  lor  jlvin%  bat  as  Agent  of  the 
agenuve  (AV  P)  reading  of  broke.  T he  black  background  on  the  Instrument  units  indicates  the  units  that  would  be 
active  in  the  pattern  for  baseball  bat  as  Instrument  of  the  instrumental  (I VP)  reading  of  broke.  The  black  back 
ground  on  the  Patient  units  indicates  the  units  that  would  be  active  in  either  the  pattern  appropriate  for  window  as 
Patient  of  the  agentive  reading  of  broke  or  in  the  instrumental  reading  of  broke. 


figure  14  Summarv  of  the  activations  on  the  case  role  units  fo .  the  sentence  The  ball  V  Ac  the  .ase 
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In  one  sense,  this  is  clearly  an  "error"  on  the  model's  part;  all  the  halls  that  it  learned 
about  were  soft,  not  hard  balls.  But  in  another  sense,  it  is  a  perfectly  sensible  response  for  the 
model  to  make.  All  of  the  other  instruments  of  breaking  (called  breakers  in  Table  3)  were,  in 
fact,  hard.  The  model  picked  up  on  this  fact,  and  shaded  its  interpretation  of  the  meaning  of 
the  word  hall  as  a  result.  As  far  as  this  model  is  concerned,  balls  that  are  used  for  breaking 
are  hard,  not  soft. 

This  kind  of  shading  of  meaning  is  just  another  manifestation  of  the  process  that  fills  in 
missing  arguments,  chooses  appropriate  verb  frames,  and  selects  contextually  appropriate 
meanings  of  words.  It  is  part  and  parcel  of  the  mechanism  that  generally  results  in  the  activa¬ 
tion  of  the  nominally  correct  feature  pattern.  It  is  a  mechanism  that  naturally  blends  together 
what  it  learns  into  a  representation  that  regularizes  slot  fillers,  in  the  same  way  that  the  verb 
learning  model  discussed  in  Chapter  18  regularizes  verbs. 

Other  Creative  Errors 

The  model  made  a  number  of  other  interesting  "errors."  Its  response  to  The  doll  moved 
was  a  particularly  striking  example.  Recall  that  the  training  stimuli  contained  sentences  from 
the  frames  The  animal  moved ,  The  human  moved,  and  The  object  moved.  In  the  first  two  cases, 
the  case  frame  contained  the  subject  as  both  Agent  and  Patient,  as  in  The  animal  moved  itself. 
In  the  third  case,  the  case  frame  contained  the  subject  only  as  Patient.  I  he  model  had  some 
difficulty  with  these  constructions,  but  generally  put  inanimate  subjects  into  the  Patient  role 
only  (as  in  The  desk  moved),  and  animate  subjects  into  both  the  Agent  and  Instrument  role. 
With  The  doll  moved,  however,  the  case-frame  representation  shows  considerable  activation  in 
the  Agent  slot.  The  partern  of  activation  there  (Figure  15)  seems  to  be  prettv  much  that  of  a 
small  (though  fragile  and  nonhuman)  girl  -  or  perhaps  it  is  simply  an  animate  doll. 

Generalization  to  Novel  Words 

We  have  already  seen  that  the  model  can  generalize  quite  well  from  particular  sentences 
that  it  has  learned  about  to  new  ones  that  it  has  not  seen  before.  It  is,  obviously,  important 
to  be  able  to  generalize  in  this  way,  since  we  cannot  expect  the  training  set  of  sentences  to 
cover  all  possible  word  combinations  in  the  language,  even  if  they  contain  only  a  single  clause. 
Thus  far.  however,  we  have  only  considered  generalization  to  sentences  made  up  of  familiar 
words.  What  would  happen,  we  wondered,  if  we  tested  the  model  using  new  words  that  it  had 
never  seen  before?  To  examine  this  issue,  we  tested  the  model  on  the  verb  touch.  Touch 
differs  from  ha  in  only  one  feature,  namely,  intensity;  otherwise  all  of  the  same  verb  features 
are  appropriate  to  it.  We  assumed  that  the  model  already  knew  the  meaning  of  touch  --  that 
is.  we  assumed  that  the  correct  input  microfeatures  were  activated  on  presentation  of  a  sen¬ 
tence  containing  it.  We  then  took  all  of  the  test  sentences  containing  the  word  hit,  replaced 
hit  with  touched,  and  tested  the  model  on  each  of  these  sentences.  Overall,  performance  was 
somewhat  worse  than  with  ha,  but  the  model  was  still  able  to  assieti  areuments  to  the  correct 
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figure  15.  Stale  of  the  model  after  processing  the  sentence  The  dull  moved.  The  units  with  the  black  back¬ 
ground  in  the  Agent  slot  indicate  the  units  appropriate  for  the  pattern  corresponding  to  an  'animate  doll "  as  Agent 
of  the  \gent-Verb-Self  (.A VS)  reading  of  move.  The  units  with  the  black  background  in  the  Patient  slot  indicate  the 
units  appropriate  for  cither  an  animate  r  (normal)  toy  doll  as  Patient  of  either  the  A  VS  or  Patient  kerb  (PVl  read 
mg  of  move. 


underlying  roles.  In  particular,  it  had  no  difficulty  assigning  animate  Subject  NI’s  to  the  role 
of' Agent  and  the  Object  \P  to  the  role  of  Patient,  nor  did  it  have  anv  problem  assigning  inan¬ 
imate  subjects  to  the  role  of  Instrument,  as  illustrated  in  figure  in. 

I  wo  characteristics  of  the  model’s  performance  with  this  novel  verb  should  be  noted. 
The  first  is  that  it  does  not  activate  all  the  correct  verb  features:  The  verb  features  captured  in 
the  ease  role  representation  arc  appropriate  fui  Hu  rather  than  i<<uch.  1  here  are  two  reasons 
for  this.  One  is  the  redundancy  of  the  representation  of  the  verbs.  I  he  input  representation 
we  are  using  is  an  extremely  rich  and  redundant  representation  for  capturing  the  very  small 
number  of  verbs  that  we  have  actually  used,  so  the  model  learns  to  predict  features  from  other 
features.  In  a  much  less  redundant  set  of  verbs,  the  model  would  be  forced  to  learn  to  relv  on 
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just  the  right  features  of  the  sentence-level  pattern  to  predict  the  features  of  the  case  role 
representation  correctly.  The  second  reason  is  that  the  deviant  feature  on  the  word  touch  does 
not  fall  inside  a  subspacc  delineated  by  a  set  of  related  exemplars  that  behave  in  similar  ways. 
What  we  are  seeing  here,  essentiallv,  is  assimilation  to  a  single  familiar  \erb  that  is  near  the 
novel  verb  in  featun  space.  When  a  novel  word  is  presented  that  more  nearly  falls  inside  the 
subspace  delineated  by  a  set  ol  related  exemplars,  the  model  is  better  able  to  correctly 
represent  all  the  features  of  the  novel  word  in  the  case  role  representation.  To  show  this,  we 
defined  a  new  fragile  object,  a  lamp,  that  was  nonhuman,  hard,  neuter,  medium-si/ed.  one- 
dimcnsional,  not  pointed,  fragile,  and  furniture.  This  set  of  features  matches  at  least  one  of 
the  breakable  objects  on  every  dimension,  but  matches  none  of  them  on  all  of  the  dimensions. 
We  then  tested  the  model  by  taking  all  of  the  familiar  test  sentences  containing  any  of  the 
familiar  breakable  objects  i  window,  vase,  and  plate I  and  then  testing  the  model  on  these  sen¬ 
tences  with  the  word  lamp  in  the  place  of  the  familiar  breakable  object.  In  many  cases,  as  in 
the  first  example  shown  in  f  igure  IT  the  model  activated  all  the  features  correctly.  I  hat  is, 
the  correct  feature  for  lamp  on  each  dimension  was  more  strongly  activated  than  any  other. 
The  model  was  most  vulnerable  to  error  on  the  VOl.l.'Ml:  and  1 ORM  dimensions,  as  in  the 
second  example  shown,  where  it  differed  from  two  of  the  three  familiar  fragile  objects  in  each 
case.  I  he  example  indicates,  then,  that  the  desirable  tendency  of  the  model  to  shade  the 
meanings  of  the  words  it  encounters  to  fit  the  typical  features  it  expects  in  certain  contexts 
need  not  lead  to  severe  distortions  of  novel  words,  as  long  as  their  feature  patterns  lull  inside 
the  space  spanned  by  a  set  of  similar  words  that  are  treated  by  the  model  in  a  consistent  way  . 

Straightforward  Extensions 

The  model  does  many  things  well,  but  it  is  reallv  just  the  first  step  toward  a  POP  model 
of  sentence  processing.  In  the  general  discussion  we  will  consider  some  of  the  more  general 
issues  that  confront  the  attempt  to  extend  a  model  of  this  type  further.  Here,  we  will  just 
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Figure  I".  Summary  of  (he  response  of  the  model  to  The  woman  broke  the  .'amp  and  Pke  boy  broke  the  .'amp. 


mention  some  straightforward  extensions  that  should  be  fairly  easy  to  implement.  We  have 
not  yet  implemented  these  changes,  so  our  discussion  is  somewhat  speculative. 

One  straightforward  set  of  extensions  to  the  model  would  involve  adding  cross- 
connections  among  the  case  role  units.  The  cross-connections  would  allow  argument  co¬ 
occurrence  constraints  that  arc  verb-frame  specific  to  be  captured  more  effectively  than  they 
can  now.  In  the  present  version  of  the  model,  the  influence  of  one  argument  on  another  can 
only  essentially  add  together  with  the  influence  of  the  verb  on  that  argument.  This  will  ulti¬ 
mately  prove  insufficient,  we  believe,  because  different  verbs  carry  with  them  radically  different 
interdependencies  among  their  arguments.  For  example,  see  imposes  some  restrictions 
between  the  Instrument  and  the  Object  fas  in  Bill  suw  the  man  with  the  telescope  and  Bill  saw 
the  man  with  the  revolver,  the  first  is  ambiguous,  the  second  is  not)  that  are  different  from 
those  posed  by  hit  ( Bill  hit  the  man  with  the  telescope  and  Bill  ha  the  man  with  the  revolver  seem 
equally  ambiguous).  The  present  version  of  the  model  would  not  easily  capture  these  kinds  of' 
verb-contingent  argument  constraints  because  each  argument  is  encoded  separately  from  the 
verb  at  the  sentence  level,  and  it  is  only  connections  from  units  at  the  sentence  level  that 
determine  the  input  to  the  case  role  units.  This  means  that  conjunctions  of  noun  characteris¬ 
tics  with  verb  characteristics  are  not  directly  at  work  in  determining  case  role  unit  activations. 
The  case  role  units,  however,  do  provide  just  such  a  conjunctive  encoding  It  seems  likely, 
therefore,  that  connections  among  these  units  would  be  able  to  capture  verb-specific  ■  or.  more 
exactly,  verb-feature  specific)  contingencies  between  slot  fillers  and  their  assignments  to  roles. 

Cross-connections  among  the  case  role  units  would  add  a  number  of  other  advantages, 
as  well.  They  would  allow  competition  among  alternative  interpretations  of' the  same  word  at 
the  case-frame  level,  so  that  the  stronger  of  two  competing  interpretations  of' the  same  word 
could  effectively  suppress  the  weaker. 
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A  second  straightforward  extension  would  be  the  addition  of  back-connections  from  the 
case  role  units  to  the  sentence-structure  units.  This,  too,  could  have  several  beneficial  effects 
on  the  performance  of  the  model.  In  an  extended  version  of  the  model,  with  cross-connections 
and  back-connections,  the  computation  performed  by  the  present  version  of  the  model  would 
be  just  the  first  step  in  an  iterative  settling  process.  This  settling  process  could  be  used  to  fill  in 
the  features  of  one  reading  or  another  of  an  ambiguous  word  at  the  sentence  level,  based  on 
the  emerging  pattern  at  the  case  role  level.  Once  filled  in,  these  features  could  then  add  to  the 
support  for  the  "dominant”  reading  over  other,  initially  partially  activated  readings  --  the 
whole  network  would,  in  effect,  drive  itself  into  a  stable  "corner”  that  would  tend  to  represent 
a  coherent  interpretation  at  both  the  sentence  and  the  case  role  level.  Kawamoto  (1985)  has 
observed  just  such  effects  in  a  simulation  of  word  disambiguation  based  on  the  brain-state-in- 
a-box  model  of  J.  A.  Anderson,  Silverstcin,  Ritz,  and  Jones  (1977).  (Cottrell,  1985.  and  Waltz 
&  Pollack.  1985,  have  also  observed  such  elfects  in  their  more  loealist  sentence-processing 
models.) 

Back  connections  would  also  allow  case  role  activations  to  actually  specify  the  semantic 
features  of  novel  or  unfamiliar  words  occurring  in  constraining  contexts.  Consider,  for  exam¬ 
ple,  the  sentence,  The  girl  broke  the  shrafe  with  the  feather.  The  context  provides  a  consider¬ 
able  amount  of  constraint  on  the  properties  of  shrafe.  The  existing  version  of  the  model  is 
able  to  fill  in  a  plausible  interpretation  at  the  case  level,  but  with  feedback  it  would  be  able  to 
pass  this  information  back  to  the  sentence  level. 

Another  way  of  passing  information  back  to  the  surface-structure  level  would  be  to  use 
the  back-propagation  learning  algorithm.  The  use  of  back  propagation  to  train  the  sentence- 
structure  units  could  allow  the  right  features  to  be  constructed  at  the  surface  level  with  only  a 
phonological  representation  of  the  words  as  the  predefined  input.  Back  propagation  might 
also  allow  us  to  cut  down  on  the  rather  excessive  numbers  of  units  currently  used  in  the 
surface-structure  level.  Right  now  there  arc  far  more  SS  units  than  are  strictly  necessary  to  do 
the  work  that  the  model  is  doing.  While  many  units  could  be  eliminated  in  a  straightforward 
way  (e  g.,  many  of  the  sentence  units  could  be  eliminated  because  they  stand  for  unlikely  con¬ 
junctions  of  features  across  dimensions,  such  as  human  and  tonf).  many  more  are  simply  redun¬ 
dant  ways  of  encoding  the  same  information  and  so  could  be  consolidated  into  fewer  units.  On 
the  other  hand,  for  a  larger  vocabulary,  some  conjunctive  units  will  turn  out  to  be  necessary, 
and  pair-wise  conjunctive  units  ultimately  will  probably  not  suffice.  Indeed,  we  feel  quite  sure 
that  no  predefined  coding  scheme  of  the  kind  we  have  used  could  prov  ide  a  sufficient  basis  for 
learning  all  the  sentences  in  any  real  language  without  being  immensely  wasteful,  so  it  will 
become  crucial  to  train  the  sentence  and  case  role  units  to  represent  just  the  needed  conjunc¬ 
tions  of  features. 
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Distributed  Representation  of  Roles 

We  mention  a  final  "straightforward"  extension  of  the  model  under  a  separate  heading 
because  it  is  both  more  speculative  and  perhaps  somewhat  more  difficult  to  understand  than 
the  previous  suggestions.  This  is  the  idea  of  using  a  distributed  representation  of  the  roles. 
The  idea  was  first  suggested  by  Hinton  (1981a),  and  is  currently  under  exploration  by  Derthick 
and  Hinton  at  Carnegie- Mellon  University.  The  essence  of  the  idea  is  to  think  of  roles  not  as 
strictly  separate,  monolithic  objects,  but  as  sets  of  role  descriptors.  Thus  the  role  of  Agent  has 
certain  properties:  It  specifies  an  active  participant  in  the  scenario,  one  that  may  be  voli¬ 
tional;  it  specifies  the  instigator  of  the  action.  The  role  of  Patient,  on  the  other  hand,  specifies 
a  passive  participant,  one  whose  volitional  involvement  is  (pretty  much)  irrelevant,  but  who  is 
the  one  that  experiences  the  effects  of  the  action. 

Various  problems  arise  with  treating  these  roles  as  unitary  objects.  One  is  that  some  but 
not  all  of  the  Patient  properties  generally  hold  for  the  role  nominally  identified  as  Patient. 
Similarly,  some  but  not  all  of  the  Agent  properties  generally  hold  for  the  role  nominally 
identified  as  Agent.  In  certain  cases,  as  with  sentences  like  The  boy  moved,  enough  of  these 
properties  hold  that  wc  were  led  to  assign  the  boy  to  both  roles  at  once. 

One  suggestion  that  has  often  been  made  is  to  proliferate  separate  roles  to  deal 
separately  with  each  of  the  slight  variants  of  each  of  the  traditional  cases.  This  leads,  of 
course,  to  a  proliferation  of  roles  that  is  ungainly,  unwieldy,  and  inelegant,  and  that  detracts 
considerably  from  the  utility  of  the  idea  of  roles  as  useful  descriptive  constructs. 

Here,  distributed  representation  can  provide  an  elegant  solution,  just  as  it  has  in  other 
instances  where  there  appears  to  be  a  temptation  to  proliferate  individualized,  unitary 
representational  constructs  (see  Chapter  17).  If  each  role  is  represented  by  a  conjunction  of 
role  properties,  then  far  more  distinct  roles  can  be  represented  on  the  same  set  of  role  units. 
Furthermore,  what  the  Agent  roles  of  two  verbs  have  in  common  is  captured  by  the  overlap  of 
the  role  features  in  the  representations  of  their  roles,  and  how  they  differ  is  captured  by  their 
differences.  The  notion  of  a  role  that  represents  a  combined  Agent  Patient  as  in  The  buy 
moved  is  no  longer  a  special  case,  and  wc  get  out  of  assigning  the  same  argument  to  two 
different  slots. 

So  far,  the  vision  outstrips  the  implementation  of  this  idea,  but  we  will  sketch  briefly  one 
very  rudimentary  instantiation  of  it,  in  what  really  amounts  to  a  rather  slight  modification  of 
the  model  we  have  described  above.  Currently,  our  case  role  units  stand  for  conjunctions  of  a 
role,  a  feature  of  the  verb,  and  a  feature  of  the  filler  of  the  role.  The  suggestion,  quite  simply  , 
is  to  replace  these  units  with  units  that  stand  for  a  feature  of  a  role,  a  feature  of  the  verb.  and 
a  feature  of  the  filler  of  the  role.  The  first  NP  in  The  boy  broke  the  window  with  the  hammer 
will  produce  a  pattern  of  activation  over  one  set  of  these  triples  (corresponding  pretty  much  to 
the  canonical  features  of  agenthood),  while  the  boy  in  The  boy  moved  would  activate  some 
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units  from  other  role  feature  sets,  as  well  as  many  of  the  typical  agent  feature  units. 

Again,  we  stress  that  we  do  not  yet  have  much  experience  using  this  kind  of  distributed 
representations  of  roles.  However,  Derthick  and  Hinton  (personal  communication,  1985)  are 
exploring  these  ideas  in  the  context  of  a  PDP  implementation  of  the  representation  language 
KL-TWO  (Brachman  &  Schmolzc,  1985;  Moser,  1983),  They  have  already  shown  that  at  least 
one  version  of  the  idea  can  be  made  to  work  and  that  the  coarse  coding  of  roles  can  be  used 
to  allow  inheritance  of  constraints  on  role  fillers. 

DISCUSSION 

Now  that  we  have  examined  the  model  in  some  detail  and  considered  some  possible 
extensions  of  it,  we  turn  to  more  general  considerations.  We  consider  thice  issues,  first,  we 
examine  the  basic  principles  of  operation  of  the  model  and  mention  briefly  why  they  are 
important  and  useful  principles  for  a  sentence-processing  model  to  embody.  Second,  we  con¬ 
sider  some  of  the  implications  of  the  model  for  thinking  about  language  and  the  representation 
of  language.  Third,  we  address  the  limitations  of  the  present  model.  This  part  of  the  discus¬ 
sion  focuses  on  a  key  question  concerning  the  feasibility  of  our  approach,  namely,  the  require¬ 
ment  that  any  plausible  model  of  language  processing  must  be  able  to  handle  sentences  con¬ 
taining  embedded  clauses. 

Basic  Features  of  the  Model 

We  emphasize  before  we  begin  that  the  basic  features  of  the  present  model  are  shared 
with  a  number  of  other  distributed  models,  especially  those  of  Hinton  (1981a)  and  those 
described  in  Chapters  3,  14,  17,  and  18  of  this  book.  The  two  most  important  properties  of 
the  model  arc  its  ability  to  exploit  the  constraints  imposed  by  all  the  arguments  in  a  sentence 
simultaneously  and  its  ability  to  represent  shades  of  meaning.  These  aspects  are  basic,  we 
believe,  to  any  attempt  to  capture  the  flexibility  and  context-sensitivity  of  comprehension. 

The  first  of  these  properties  is,  perhaps,  just  as  easily  capturable  using  local  rather  than 
distributed  connectionist  networks.  These  local  conncctionist  models  capture  this  property 
much  more  effectively  than  they  have  been  captured  in  nonconnectionist  mechanisms  (e.g., 
Small’s,  1980,  word  expert  parser;  cf.  Cottrell  &  Small.  1983).  Such  networks  have  been 
applied  to  sentence  processing,  particularly  to  the  problems  of  ambiguity  resolution  and  role 
assignment  (Cottrell,  1985;  Waltz  &  Pollack,  1985).  Both  models  use  single  units  to  stand  for 
alternative  meanings  of  words  or  as  ’’binders”  to  tie  words  to  alternative  roles,  and  use  mutual 
activation  and  inhibition  to  select  between  alternative  meanings  and  alternative  role  assign¬ 
ments. 


The  present  model  exhibits  many  of  these  same  properties,  but  uses  distributed  represen¬ 
tations  rather  than  local  ones.  What  the  distributed  representations  have  that  local  represen¬ 
tations  lack  is  the  natural  ability  to  represent  a  huge  palette  of  shades  of  meaning.  With 
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distributed  representations,  it  is  quite  natural  to  represent  a  blend  of  familiar  concepts  or  a  1 

shaded  version  of  a  familiar  concept  that  fits  a  scenario  (Waltz  &  Pollack,  1985,  do  this  with 

their  context  microfeatures).  Perhaps  this  is  the  paramount  reason  why  the  distributed 

approach  appeals  to  us.  To  be  sure,  it  is  possible  to  represent  different  shades  of  meaning  in  a  ( 

localist  network.  One  can,  for  example,  have  different  units  for  each  significantly  different  j 

variation  of  the  meaning  of  a  word.  A  problem  arises,  though,  in  specifying  the  meaning  of  I 

"significantly  different."  We  will  probably  all  agree  that  there  arc  different  readings  of  the 

word  bat  in  the  sentences  The  bat  hit  the  ball  and  The  bat  flew  round  the  cave.  But  what  about 

the  word  chicken  in  the  sentences  The  woman  ate  the  chicken  and  The  wolf  ate  the  chicken ?  Or 

what  about  the  word  ball  in  The  baby  kicked  the  ball  and  The  ball  broke  the  window ?  There  is 

no  doubt  that  we  think  of  different  balls  in  these  cases;  but  do  we  really  want  to  have  a  j 

separate  unit  in  memory  for  the  soft,  squishy,  rubber  ball  the  baby  kicks  and  the  small,  hard 

ball  that  can  break  a  window? 

* 

With  distributed  representations,  we  do  not  have  to  choose.  Different  readings  of  the 
same  word  are  just  different  patterns  of  activation;  really  different  readings,  ones  that  are 
totally  unrelated,  such  as  the  two  readings  of  bat  simply  have  very  little  in  common.  Readings  I 

that  are  nearly  identical  with  just  a  shading  of  a  difference  are  simply  represented  by  nearly 
identical  patterns  of  activation. 

These  properties  of  distributed  representations  are  extremely  general,  of  course,  and  they  ; 

have  come  up  before,  particularly  in  the  chapter  on  schemata  (Chapter  14).  We  also  just  | 

invoked  them  in  suggesting  that  we  might  be  able  to  use  distributed  representations  instead  of 
some  fixed  set  of  case  roles.  In  both  of  these  other  cases,  as  in  the  case  of  distributed 
representation  of  word  senses,  the  use  of  distributed  representation  allows  for  all  shades  and 
degrees  of  similarity  and  difference  in  two  representations  to  be  captured  in  a  totally  seamless 
way. 

A  final  basic  feature  of  the  model  is  the  gradualness  of  acquisition  it  exhibits.  We  have 
not  stressed  the  time  course  of  acquisition,  but  it  was,  of  course,  a  crucial  property  of  the  verb 
learning  model,  described  in  Chapter  18,  and  it  is  quite  evident  that  acquisition  is  gradual  from 
Figure  4.  As  with  the  verb  learning  model,  our  model  also  seems  to  pick  up  on  the  strongest 
regularities  first.  This  is  seen  most  easily  in  Figures  5  and  6  by  comparing  NVN  sentences  i 

from  the  hi;  and  broke  generators.  Those  with  animate  preverbal  NPs.  which  are  Agents,  arc 
learned  more  quickly  than  those  with  inanimate  prcvcrbal  NPs,  which  are  Instruments.  This  is 
because  a  far  greater  number  of  constructions  have  animate.  Agent  subjects  than  have  inani¬ 
mate,  Instrument  subjects. 

! 

These  three  basic  properties  —  exploitation  of  multiple,  simultaneous  constraints,  the 
ability  to  represent  continuous  gradations  in  meaning,  and  the  ability  to  learn  gradually, 
without  formulating  explicit  rules,  picking  up  first  on  the  major  regularities,  are  hallmarks  of 
parallel  distributed  models,  and  they  are  no  less  applicable  to  comprehension  of  language  than 
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they  are  to  any  other  aspect  of  cognitive  processes. 

Do  Words  Have  Literal  Meanings? 

There  is  one  further  aspect  of  the  distributed  approach  to  representation  of  meaning  that 
should  be  mentioned  briefly.  This  is  the  stand  our  model  takes  on  the  issue  of  whether  words 
have  literal  meanings.  It  is  normal  and  natural  to  think  of  words  as  having  literal  meanings, 
but  it  is  very  difficult  to  say  what  these  meanings  really  are.  For,  as  we  have  noted 
throughout  this  chapter,  the  apparent  meanings  of  words  are  infinitely  malleable  and  very 
difficult  to  pin  down.  An  alternative  view  is  that  words  are  clues  to  scenarios.  This  view, 
which  has  been  proposed  by  Rumclhart  (1979)  among  others,  never  made  very  much  impres¬ 
sion  on  us  until  we  began  to  study  the  present  model.  However,  in  exploring  the  model,  we 
have  found  that  it  embodies  Rumclhart's  idea  exactly.  A  sentence  assembles  some  words  in  a 
particular  order,  and  each  provides  a  set  of  clues  that  constrains  the  characteristics  of  the 
scenario,  each  in  its  own  way.  The  verb,  in  and  of  itself,  may  specify  a  range  of  related 
scenarios  and  certain  constraints  on  the  players.  The  nouns  further  restriefthe  scenario  and 
further  constrain  the  players.  But  the  words  themselves  are  no  longer  present  in  the  scenario, 
nor  is  there  necessarily  anything  in  the  scenario  that  corresponds  to  the  literal  meaning  of  any 
of  the  words.  Thus  in  the  case  of  The  doll  moved ,  the  (partially  activated)  Agent  is  not  a  copy 
of  the  standard  doll  pattern,  but  a  pattern  appropriate  for  a  doll  that  can  move  under  its  own 
steam. 

The  crucial  point,  here,  is  that  all  the  words  work  together  to  provide  clues  to  the  case 
frame  representation  of  the  sentence,  and  none  of  the  words  uniquely  or  completely  determine 
the  representation  that  is  assigned  to  any  of  the  constituents  of  the  underlying  scenario.  Cer¬ 
tainly,  the  word  hammer  most  strongly  constrains  the  filler  of  the  Instrument  role  in  The  hoy 
broke  the  vase  with  the  hammer ,  but  the  other  words  contribute  to  the  specification  of  the  filler 
of  this  role,  and  hammer  contributes  to  the  specification  of  the  fillers  of  the  other  roles.  Com¬ 
pare  The  prisoner  struck  the  rock  with  the  hammer  and  The  boy  broke  the  vase  with  the  feather. 
The  former  suggests  a  heavier  hammer;  the  latter  suggests  an  extremely  fragile  vase  (if  we  give 
an  instrumental  reading  to  the  with-NP). 

Toward  a  More  Complete  Model  of  Sentence  Processing 

As  we  have  already  made  clear,  the  model  that  we  have  described  in  this  chapter  is  far 
from  a  complete  model  of  the  psychological  processes  involved  in  sentence  processing.  It  does 
not  deal  with  the  fact  that  sentence  processing  is  an  on-line  process,  a  process  that  unfolds  in 
real  time  as  each  word  is  heard.  It  does  not  deal  with  the  integration  of  processed  sentences 
into  larger  contextual  frameworks.  It  does  not  handle  anaphora  and  other  referential 
phenomena,  or  tense,  aspect,  or  number.  No  attempt  is  made  to  deal  with  quantification  or 
scoping  issues.  The  model  even  lacks  a  way  of  distinguishing  different  tokens  with  identical 
fcatural  descriptions.  Thus  it  does  not  explicitly  designate  separate  dogs  in  dog  eat  dog  and 
only  one  dog  in  The  dog  chased  himself.  Finally,  the  model  completely  ignores  the 


I 


McClelland  and  Kawamoto 
April  30,  1986 


Sentence  Processing 
43 


complexities  of  syntax.  For  the  present  model,  a  sentence  can  come  in  only  one  rigidly  struc¬ 
tured  form,  and  no  embedded  clauses,  cleft  sentences,  or  passive  constructions  are  allowed. 

Clearly,  we  have  much  work  to  do  before  we  can  claim  to  have  a  model  that  is  in  any 
sense  complete  or  adequate.  The  question  is,  can  it  be  done  at  all?  Is  there  any  fundamental 
limitation  in  the  PDP  approach  that  will  prevent  the  successful  development  of  a  full-scale 
model  of  language  processing  that  preserves  the  positive  aspects  of  the  distributed  approach? 

Obviously,  the  proof  is  in  the  pudding.  But  we  think  the  enterprise  can  succeed.  Rather 
than  discuss  all  of  the  issues  raised  above,  we  will  discuss  one  that  seems  quite  central, 
namely,  the  application  of  PDP  models  to  the  processing  of  sentences  with  embedded  clauses. 
We  consider  several  different  ways  that  PDP  models  could  be  applied  to  the  processing  of  such 
sentences. 

Interfacing  PDP  mechanisms  with  conventional  parsers.  We  start  with  what  might 
be  the  simplest  view,  or  at  any  rate  the  most  conventional:  the  idea  that  a  model  such  as  ours 
might  be  interfaced  with  a  conventional  parser.  For  example,  we  might  imagine  that  a  parser 
similar  to  Marcus’s  PARSIFAL  (1980)  might  pass  off  the  arguments  of  completed  (or  possibly 
even  incomplete)  clauses  to  a  mechanism  such  as  the  one  we  have  proposed  for  case  role 
assignment  and  PP  attachment.  In  this  way,  the  ”role-assignmcnt  module”  could  be  used 
with  any  given  sentoid  and  could  be  called  repeatedly  during  the  processing  of  a  sentence  con¬ 
taining  embedded  clauses. 

Interfacing  our  model  with  a  conventional  parser  would  perhaps  provide  a  way  of  com¬ 
bining  the  best  of  both  conventional  symbol  processing  and  parallel  distributed  processing. 
W'e  are  not,  however,  particularly  inclined  to  follow  this  route  ourselves.  For  it  appears  that  it 
will  be  possible  to  implement  the  parser  itself  as  a  PDP  mechanism.  As  we  shall  see,  there  are 
at  least  three  ways  this  might  be  done.  One  involves  implementing  a  true  recursive  automaton 
in  a  PDP  network.  We  describe  this  method  first,  even  though  we  suspect  that  the  human 
parser  is  not  in  fact  such  a  machine.  After  describing  the  mechanism,  we  will  explain  our 
objections  to  this  view  of  the  human  sentence-processing  mechanism.  This  will  lead  us  to  sug¬ 
gest  two  other  mechanisms.  One  relies  on  the  connection  information  distribution  mechanism 
described  in  Chapter  16  to  program  a  parallel  net  to  process  sentences  of  indefinite  length  and 
embeddedness;  the  other  operates  iteratively  rather  than  recursively.  It  is  more  computation¬ 
ally  limited  in  some  respects  than  the  other  mechanisms,  but  the  limitations  appear  to  conform 
to  those  of  the  human  parser,  as  we  shall  see. 

.4  PDP  model  that  does  recursion.  It  turns  out  that  it  is  not  difficult  to  construct  a 
parallel  network  that  does  true  recursive  processing.  Hinton  (personal  communication) 
worked  out  the  scheme  we  will  describe  here  and  implemented  a  rudimentary  version  of  it  in 
1973.  While  such  a  mechanism  has  not  been  applied  directly  to  parsing,  the  fact  that  recursive 
processing  is  possible  suggests  that  there  is  no  reason,  in  principle,  why  it  should  not  provide  a 
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sufficient  basis  for  implementing  some  kind  of  parser. 

The  mechanism  consists  of  a  large  network  of  units.  Patterns  of  activation  on  these  units 
are  distributed  representations  of  a  particular  state  in  a  processing  sequence.  Processing 
occurs  through  the  succession  of  states.  The  units  are  divided  up  into  subnets  that  represent 
particular  parts  of  the  state.  One  important  subnet  is  a  set  of  units  that  provides  a  distributed 
pattern  corresponding  to  a  stack-level  counter. 

The  connections  in  the  network  arc  set  up  so  that  successive  states  of  a  routine  arc  asso¬ 
ciated  with  their  predecessors.  Thus,  when  one  state  is  in  place,  it  causes  another  to  follow  it. 
States  may  also  be  used  to  drive  actions,  such  as  output  of  a  line  segment,  say,  if  the  automa¬ 
ton  is  a  mechanism  for  recursively  drawing  figures,  as  Hinton's  was.  Processing,  then, 
amounts  to  carrying  out  a  sequence  of  states,  emitting  actions  (and  possibly  reading  input) 
along  the  way. 

Calling  subroutines  in  such  a  mechanism  is  not  particularly  difficult,  since  all  that  is 
required  is  to  associate  a  particular  state  (the  calling  state)  with  the  start  state  of  some  rou¬ 
tine.  Passing  parameters  to  the  called  routine  is  likewise  not  particularly  difficult;  in  the  sim¬ 
plest  case  they  can  be  parts  of  the  calling  state  that  are  carried  along  when  the  routine  is 
called. 

To  implement  recursion  in  such  a  network,  all  that  is  required  is  a  way  to  reinstate  the 
calling  state  when  a  routine  is  done.  To  do  this,  the  mechanism  associates  the  state  of  the 
stack-level  units  with  the  state  that  is  in  place  oser  the  rest  of  the  units,  using  an  associative 
learning  rule  to  adjust  connection  strengths  while  processing  is  taking  place.  These  associa¬ 
tions  arc  implemented  by  rapidly  changing  a  short-term  component  of  the  weights  whose 
long-term  values  implement  the  associations  that  allow  the  model  to  cycle  through  a  sequence 
of  states.  The  temporary  associations  stored  in  these  short-term  weights  arc  not  strong 
enough  to  overrule  the  long-term  weights,  but  they  are  sufficiently  strong  to  determine  the 
next  state  of  the  network  when  the  long-term  weights  leave  several  possibilities.  So,  at  the  end 
of  a  routine  at  stack  level  V,  the  network  associative!}’  reinstates  stack  level  X  -  I,  with  the 
rest  of  the  state  cleared.  This  associative  reinstatement  of  the  previous  stack-level  state  would 
be  based  on  long-term,  relatively  permanent  associations  between  states  corresponding  to 
adjacent  depths  in  the  stack.  This  reinstated  stack-level  state,  which  was  associated  in  the 
short-term  weights  with  the  calling  state  just  before  the  subroutine  call  occurred,  would  then 
simply  use  this  short-term  association  to  reinstate  the  pattern  that  existed  before  the  call.  (A 
"done’’  bit  would  have  to  be  set  to  keep  the  process  from  doing  the  call  again  at  this  point.) 

There  is  no  apparent  a  priori  limit  to  the  number  of  embedded  calls  that  could  be  carried 
out  by  such  a  network,  though  for  any  fixed  si/c  of  the  stack-level  subnet,  there  will  be  a 
corresponding  maximum  number  of  associations  that  can  be  learned  without  error.  Of  course, 
sinular  limitations  also  occur  with  all  other  stack  machines;  the  stack  is  always  of  finite  depth. 
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There  would  also  likely  he  interference  of  previous  calling  states  when  returning  from  any  par¬ 
ticular  level,  unless  the  learning  were  carefully  tuned  so  that  earlier  associations  with  a  particu¬ 
lar  stack  level  were  almost  completely  wiped  out  or  decayed  by  the  time  a  new  one  must  he 
used.  Care  would  also  be  necessary  to  avoid  crosstalk  between  stack-level  representations. 
However,  these  problems  can  be  overcome  by  using  enough  units  so  that  very  different  states 
are  used  to  represent  each  level  of  the  stack. 

Drawbacks  of  true  recursion.  The  scheme  described  in  the  previous  section  has  several 
fairly  nice  properties  and  deserves  considerably  more  exploration  than  it  or  some  obvious  vari¬ 
ants  have  received  to  date.  However,  it  docs  have  one  drawback  from  our  point  of  view  -  one 
that  it  shares  with  other,  more  conventional  implementations  of  truly  recursive  automata.  The 
drawback  is  that  the  calling  state  is  not  present  and  active  during  the  subroutine  call;  it  is 
effectively  inaccessible  until  it  is  reinstated  after  the  return. 

This  property  of  truly  recursive  schemes  limits  their  ability  to  simultaneously  consider 
binding  a  prepositional  phrase  at  each  of  two  levels  of  embedding.  Consider  the  sentences: 

(14)  The  boy  put  the  cake  the  woman  made  in  the  kitchen. 

( 1 5)  The  boy  saw  the  cake  the  woman  made  in  the  kitchen. 

Our  preferred  reading  of  the  first  of  these  two  sentences  has  the  boy  putting  the  cake  in  the 
kitchen,  rather  than  the  woman  preparing  it  there;  while  in  the  second  case,  the  preferred 
interpretation  appears  to  be  that  the  woman  made  the  cake  in  the  kitchen,  and  the  boy  saw  it 
at  some  unspecified  location.  Since  the  material  is  the  same  from  the  beginning  of  the  embed¬ 
ding  in  both  cases,  it  appears  that  the  demand  the  matrix  clause  material  ( The  boy  put  the  cake 
.  .  .)  makes  for  a  locative  argument  influences  the  decision  about  whether  in  the  kitchen  should 
be  bound  into  the  subordinate  clause.  While  it  may  be  possible  to  arrive  at  these  two  different 
readings  in  a  conventional  parser  by  backtracking  or  by  passing  aspects  of  the  calling  state 
along  when  the  subroutine  is  called,  it  would  seem  to  be  more  natural  to  suppose  that  the 
matrix  clause  is  actively  seeking  its  missing  locative  argument  as  the  embedded  material  is 
being  processed,  and  so  is  prepared  to  steal  in  the  kitchen  from  the  verb  in  the  embedded 
clause.  Thus,  it  appears  that  a  mechanism  capable  of  processing  at  different  levels  of  embed¬ 
ding  at  the  same  time  is  needed. 

.4  fixed-length  sentence  processor.  A  connectionist  parser  that  can.  in  principle,  handle 
this  kind  of  competition  among  alternative  attachment  decisions  at  different  levels  of  embed¬ 
ding  has  recently  been  developed  and  implemented  by  Fanty  (1985).  He  describes  a  mechan¬ 
ism  that  effectively  parses  a  sentence  at  many  levels  at  the  same  time.  The  parser  consists  of  a 
fixed  network  of  units.  Some  of  the  units  represent  the  terminal  and  nonterminal  symbols  of 
the  grammar;  other  units,  called  match  units,  represent  the  different  possible  expansions  of 
each  nonterminal.  The  symbol  units  are  easily  represented  in  a  table  in  which  the  columns 
represent  starting  positions  in  the  input  string  and  the  rows  represent  lengths.  There  is  a  unit 
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for  each  terminal  symbol  in  each  position  of  the  bottom  row,  and  there  is  a  unit  for  each  non¬ 
terminal  symbol  at  every  position  in  the  table;  that  is,  there  is  a  copy  of  each  nonterminal  unit 
for  every  possible  portion  of  the  input  that  it  could  cover.  For  each  of  these  there  is  a  set  of 
binder  units,  one  for  each  possible  expansion  of  each  nonterminal  unit. 

A  simple  version  of  the  table,  for  the  indicated  three-rule  grammar,  is  shown  in  Figure 
IS.  Only  a  subset  of  the  units  —  the  ones  that  would  become  active  in  the  parse  of  aabbb  — 
are  shown. 

The  parser  can  only  process  strings  up  to  a  predefined  maximal  length.  Fsscntially,  it 
processes  the  entire  sentence  in  one  two-pass  processing  sweep.  In  the  first,  bottom-up  pass, 
all  possible  constituents  are  identified,  and  in  the  second,  top-down  pass,  the  constituents  that 
fit  together  with  the  top  S  and  the  subconstituents  of  the  top  S  are  reinforced.  These  active 
units  represent  the  parse  tree. 

A  very  nice  feature  of  Fanty's  parse  is  that  it  takes  into  account  all  levels  of  the  parse 
tree  simultaneously.  This  allows  it  to  find  a  globally  satisfactory  parse  in  one  pair  of  sweeps, 
eliminating  possible  constituents  that  do  not  fit  together  with  others  to  make  a  globally 
acceptable  structure.  With  some  modifications  (different  degrees  of  strength  for  different  rules; 
continuous,  interactive  processing  as  opposed  to  a  single  pair  of  sweeps),  it  would  probably  be 
possible  to  implement  a  mechanism  that  could  choose  the  ’’better”  of  two  alternative  accept¬ 
able  parses,  as  people  seem  to  do  with  many  ambiguous  sentences.  The  parser  of  Selman  and 


starting  position 

Figure  IS.  Parsing  table  used  in  Fanty’s  multilevel  parser  Only  the  units  active  in  the  correct  parse  of  the 
string  jahbh  are  illustrated.  (From  Context-Free  Parsing  in  Conneetionisi  \et works  |1R  I  "4.  p  ,t|  by  XI  Parity. 
IdXS.  Rochester.  NY:  Lniversitv  of  Rochester.  Department  of  Computer  Science  C  opyright  I '1X5  hy  \|.  I  unty 
Reprinted  hy  permission.) 
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Hirst  (1985:  Selman,  1985),  which  is  similar  to  the  Fanty  parser  in  structure  hut  uses  simulated 
annealing,  appears  to  have  just  the  right  characteristics  in  this  regard. 

However,  this  kind  of  parser  does  have  some  drawbacks.  Most  importantly,  it  is  limited 
to  sentences  of  a  prespecified  length.  To  expand  it,  one  needs  not  only  to  add  more  units,  one 
needs  to  program  these  units  with  the  connections  that  allow  them  to  do  the  jobs  they  are 
needed  or  potentially  needed  to  do.  Second,  the  size  docs  grow  rather  quickly  with  the  allow¬ 
able  length  (sec  Fanty,  1985,  for  details). 

Fantv’s  model  is,  in  fact,  somewhat  reminiscent  of  the  TRACE  model  of  speech  percep¬ 
tion  (Chapter  15)  in  its  reduplication  of  dedicated  units  and  connections.  As  with  TRACE,  it 
may  be  possible  to  use  connection  information  distribution  (Chapter  16)  to  program  the 
necessary  connections  in  the  course  of  processing  from  a  single,  central  network  containing  the 
system's  long-term  knowledge  of  the  rules  of  English.  Indeed,  Fanty  (1985)  has  explored  an 
ofT-line  variant  of  the  connection  information  distribution  scheme;  his  version  learns  new  pro¬ 
ductions  locally,  and  sends  the  results  to  a  central  network  which  then  distributes  the  results 
to  the  rest  of  the  net,  ofT-line.  If  the  programming  of  the  connections  could  be  made  to  work 
on-line  in  the  course  of  processing,  as  with  the  programmable  blackboard  model  of  reading, 
we  would  have  a  mechanism  that  still  needs  large  numbers  of  units  and  connections,  but  cru¬ 
cially,  these  would  be  uncommitted  units  and  programmable  connections.  The  computational 
capability  of  the  machine  would  be  expandable,  simply  by  increasing  the  size  of  the  pro¬ 
grammable  network. 

Myths  about  the  limitations  of  PDP  models.  We  hope  that  these  last  two  sections 
will  help  to  dispel  some  widely  held  beliefs  about  computation  and  PDP  models.  We  will 
briefly  consider  two  variants  of  these  beliefs.  The  first,  quite  simply,  is  that  PDP  models  can¬ 
not  do  recursion.  Hinton’s  recursive  processor  needs  considerably  more  development  before 
we  will  have  a  working  simulation  that  proves  that  this  belief  is  wrong,  but  it  seems  clear 
enough  that  a  recursive  PDP  processor  will  be  available  fairly  soon.  In  fact,  it  is  likely  that  a 
slightly  different  approach  will  be  explored  more  fully  first:  Tourctzky  and  Hinton  (1985)  have 
recently  developed  a  PDP  implementation  of  a  production  system  that  can  do  rudimentary 
variable  binding,  and  at  present  it  appears  that  they  may  be  able  to  extend  it  to  perform  recur¬ 
sive  computations. 

The  second  belief  seems  to  follow  from  the  first:  It  is  that  PDP  models  arc  inherently 
incapable  of  processing  a  full  range  of  sentences.  We  say  it  only  seems  to  follow  because  it 
depends  upon  accepting  the  assumption  that  in  order  to  process  sentences  it  is  necessary'  to  be 
able  to  do  recursion.  Most  people  who  have  thought  computationally  about  sentence  process¬ 
ing  arc  familiar  with  sentence-processing  mechanisms  that  are  in  fact  recursive,  such  as  the 
AIN  (Woods,  1970)  or  the  Marcus  parser.  Since  sentences  arc  recursively  defined  structures, 
the  (implicit)  argument  goes,  the  mechanisms  for  parsing  them  must  themselves  operate  recur¬ 
sively. 
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Fanty’s  ( I ‘>S5 >  parser,  or  an  extension  of  it  incorporating  connection  information  distri¬ 
bution,  begins  to  suggest  that  this  may  not  be  so.  In  a  programmable  version  of  Fanty’s 
parser,  we  would  have  captured  the  essential  property  of  a  recursive  automaton  -  that  the 
same  procedural  knowledge  be  applicable  at  any  point  in  a  parse  tree.  Put  we  would  have 
captured  it  in  a  very  exciting  new  way,  a  way  that  would  free  the  mechanism  from  the  serial 
processing  constraint  that  prevents  conventional  recursive  mechanisms  from  being  able  to 
exploit  constraints  from  many  levels  at  the  same  time.  Connection  information  distribution 
mav  actually  permit  us  to  teap  the  benefits  of  simultaneous  mutual  constraints  while  at  the 
same  time  enjoying  the  benefits  of  being  able  to  apply  the  same  bit  of  knowledge  at  many 
points  in  processing  the  same  sentence. 

I  here  are  a  couple  of  caveats,  however.  One  is  that  connection  information  distribution 
is  very  expensive  computationally;  a  considerable  number  of  units  and  connections  are 
required  to  handle  the  input  to  and  output  from  the  central  knowledge  structures  (see  Chapter 
12  for  further  discussion),  and  the  resource  demands  made  by  even  a  fixed  version  of  Fanty’s 
parser  are  quite  high  already.  There  may  turn  out  to  be  ways  of  reducing  the  resource 
demands  made  by  the  Fanty  parser.  In  the  meantime,  it  is  worth  asking  whether  some  other 
approach  might  not  succeed  nearly  as  well  with  fewer  resources. 

Context-sensitive  coding,  iteration,  and  center  embedding.  There  is  one  more  belief 
about  sentences,  this  one  even  more  deeply  ingrained  than  the  last,  that  we  have  tacitly 
accepted  up  to  this  point.  This  is  the  belief  that  sentences  are  indeed  recursively  defined  struc¬ 
tures.  Clearly,  sentences  are  recursively  definable,  but  there  is  one  proviso:  Only  one  level  of 
center  embedding  is  allowed.  It  may  be  controversial  whether  the  "true"  competence  gram¬ 
mar  of  Fnglish  accepts  multiple  center-embedded  sentences,  but  people  cannot  parse  such  sen¬ 
tences  without  the  use  of  very  special  strategics,  and  do  not  even  judge  them  to  be  acceptable 
(G.  A.  Miller,  1962).  Consider,  for  example,  the  "sentence”: 

( 16)  The  man  who  the  girl  who  the  dog  chased  liked  laughed. 

The  unparsability  of  sentences  such  as  (16)  has  usually  been  explained  by  an  appeal  to  adjunct 
assumptions  about  performance  limitations  (e.g..  working-memory  limitations),  but  it  may  be. 
instead,  that  they  are  unparsable  because  the  parser,  by  the  general  nature  of  its  design,  is  sim¬ 
ply  incapable  of  processing  such  sentences.1’ 

It  should  be  noted  that  parsers  like  the  AIN  and  the  Marcus  parser  are  most  certainly 
not  intrinsically  incapable  of  processing  such  sentences.  Indeed,  such  parsers  arc  especially 
well  suited  for  handling  an  indefinite  number  of  center  embeddings.  Such  mechanisms  are 


13.  I  his  argument  has  been  made  previously  by  a  number  of  oilier  authors,  including  Reich  In  fact.  Reich 

proposed  an  iterative  approach  to  sentence  processing  that  is  similar,  in  some  respects,  to  the  one  we  consider  here. 
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clearly  necessary  for  processing  such  things  as  Lisp  expressions,  such  as  this  one  taken  from  a 
program  (actually,  itself  a  parser)  written  by  Je(F  Sokolov: 

((and  (eq  (car  (explode  arg))  V  ) 

(eq  (car  (reverse  (explode  arg)))  )) 

(implode  (reverse  (edr  (reverse  (edr  (explode  arg))))))) 

But  sentences  in  natural  language  arc  simply  not  structured  in  this  way.  Perhaps,  then,  the 
search  for  a  model  of  natural  language  processing  has  gone  down  the  garden  path,  chasing  a 
recursive  white  rabbit. 

One  approach  to  sentence  processing  holds  that  we  should  ask  much  less  of  a  syntactic 
parser  and  leave  most  of  the  work  of  sentence  processing  to  a  more  conceptual  level  of  pro¬ 
cessing.  This  position  is  most  strongly  associated  with  Schank  (1973),  and  Charniak  (1983)  is 
among  the  recent  proponents  of  this  view. 

Let  us  assume  that  the  job  of  the  parser  is  to  spit  out  phrases  encoded  in  a  form  that 
captures  their  local  context,  in  a  way  that  is  similar  to  the  wav  the  verb  learning  model, 
described  in  Chapter  18,  in  a  form  that  captures  their  local  context1.4  Such  a  representation 
may  prove  sufficient  to  allow  us  to  reconstruct  the  correct  bindings  of  noun  phrases  to  verbs 
and  prepositional  phrases  to  nearby  nouns  and  verbs.  In  fact,  we  suspect  that  this  kind  of 
local  context-sensitive  encoding  can  capture  the  attachments  of  \Ps  to  the  right  verbs  in 
’’tail-recursive"  sentences  like 

(17)  This  is  the  farmer  that  kept  the  cock  that  waked  the  priest  that  married  the  man  that 
milked  the  cow  that  tossed  the  dog  that  worried  the  cat  that  killed  the  rat  that  ate  the 
malt  that  lay  in  the  house  that  Jack  built.15 

Locally  context-sensitive  coding  will  begin  to  break  down,  however,  when  there  is  center 
embedding  -  specifically,  more  than  one  level  of  center  embedding.  Local  context  can  be  used 
to  signal  both  the  beginning  and  the  end  of  an  embedding,  but  cannot  be  used  to  specify 
which  beginning  of  an  embedding  the  material  just  after  the  end  of  an  embedding  should  be 
bound  to.  Thus  if  we  read  to  the  last  word  of 


14.  For  this  to  work  it  will  be  necessary  to  code  units,  not  in  terms  of  the  adjacent  words,  but  in  terms  of  neighbor¬ 
ing  constituents  more  abstractly  defined.  Thus,  in  The  girl  in  the  hat  saw  the  mouse  on  the  floor,  we  will  want  to  en¬ 
code  the  complex  \P  the  girl  in  the  hat  as  adjacent  to  the  verb  saw.  t  hus,  local  context  will  have  to  be  defined,  as 
it  is  in  Marcus  (19K0),  in  terms  of  constituents,  not  merely  in  terms  of  words.  Getting  this  to  work  will  be  one  of  the 
major  challenges  facing  this  approach. 


15  Adapted  from  F.  Johnson.  F.  R.  Sickels,  &  F.  C.  Sayers  (Eds.).  Anthology  of  Children's  f.iterature  (4th  ed..  p 
16).  l’JTO,  Boston:  Noughton-Mifflin. 
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(18)  The  man  who  the  girl  who  the  dog  chased  laughed 

we  may  not  know  whether  to  bind  laughed  (just  after  the  end  of  the  embedding)  to  the  man 
(NP  before  an  embedding)  or  to  the  girl  (a  different  NP  before  an  embedding).  If  we  bind  it  to 
the  man,  we  may  experience  a  false  sense  of  closure  --  this  sentence  is  ungrammatical  because 
it  has  only  two  verbs  for  three  clauses. 

These  suggestions  lead  us  to  the  following  conjecture:  It  may  be  possible  to  build  a  PDP 
language-processing  mechanism  that  works  iteratively  along  a  sentence,  building  constituents 
represented  by  distributed  patterns  conjunctively  encoded  with  their  local  context  of 
occurrence.  The  mechanism  would  need  a  way  of  unifying  constituents  on  two  sides  of  an 
intervening  embedding.  Txactly  how  this  would  be  done  remains  to  be  established,  but  as 
long  as  it  is  done  in  terms  of  a  mechanism  sensitive  only  to  the  local  context  of  the  consti¬ 
tuents  before  and  after  the  embedding,  it  may  succeed  where  there  is  a  single  embedding  but 
fail  in  multiply  embedded  sentences  where  there  arc  two  suspended,  incomplete  constituents 
that  the  mechanism  must  choose  between  completing. 

We  hope  it  is  clear  that  these  ideas  are  speculative  and  that  they  are  but  pointers  to 
directions  for  further  research.  Indeed,  all  three  of  the  directions  we  have  described  in  this  sec¬ 
tion  arc  only  just  beginning  to  be  explored  systematically,  and  it  is  unclear  which  of  them  will 
prove  most  attractive  on  closer  scrutiny.  We  mention  them  because  they  suggest  that  ways  of 
overcoming  some  of  the  apparent  limitations  of  PDP  mechanisms  may  not  be  very  far  beyond 
our  present  grasp,  and  that  it  may  soon  be  possible  to  retain  the  benefits  of  parallel  distributed 
processing  in  mechanisms  that  can  cope  with  the  structural  complexity  and  semantic  nuance 
of  natural  language. 
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